Nucleic acid compositions conferring altered metabolic characteristics

ABSTRACT

This invention encompasses the identification and isolation genes and gene fragments that confer altered metabolic characteristics in  Nicotiana benthamiana  plants, when expressed using GENEWARE™ viral vectors. These genes are derived from a variety of sources. Expression of these genes resulted in alterations of the levels of at least one of the following metabolites: acids, fatty acids, amino acids and related compounds, branched fatty acids, carbohydrates, hydrocarbons, alkaloids and other bases, esters, glycerides, phenols and related compounds, alcohols, alkenes and alkynes, sterols, oxygenated terpenes, and other isoprenoids, and ketones and quinones.

FIELD OF THE INVENTION

This invention relates to deoxyribonucleic acid (DNA) and amino acidsequences that confer altered metabolic characteristics in plants.

BACKGROUND OF THE INVENTION

Plants are photosynthetic organisms able to fix inorganic carbon (CO₂)in organic matter via energy from light and minerals contained in water.All carbons fixed primarily via the pentose/triose phosphate cycle areconverted in numerous anabolic pathways necessary to sustain life(primary metabolism). To survive plants must adapt to their environmentand synthesize an extremely wide range of organic compounds required tointeract with the elements of their microenvironment (secondarymetabolism). To capture the biochemical diversity of this particularkingdom both primary and secondary metabolism have to be taken intoaccount. The primary metabolism is represented by the biosynthesis ofbuilding blocks of macromolecules such as amino acids, fatty acids,carbohydrates, and sterols.

Each of these groups of compounds is of economic importance. Fatty acidscan also be used as a raw material for industrial applications in avariety of products, including soaps, lubricants, paints, detergents,adhesives, and plasticizers. Furthermore, fatty acids are the majorcomponents of edible oils. For example, fatty acid compounds areinvolved in building blocks for protection (cell membrane, epicuticularpolymers), storage of energy in the plant seeds and as secondarymessengers in the plant cell. As another example, carbohydrates areintermediates in the biosynthesis of energy reserves (starch, cellulose)and building blocks of the cell wall giving the plant shape andstructure. The carbohydrates are the carbon skeletons of manybiosynthetic reactions. As such, the ability to alter carbohydratemetabolism could lead to many improvements in plants, includingincreased transport and accumulation of starch by accumulation of hexosephosphate that could improve starch yield in the seed and the plant;alterations in the cell wall for better resistance to pest and drought;better digestibility for forage plants; and better processivity for pulpproduction in paper industry (e.g., less lignin and hemicellulose).

The advent of modern biology, particularly molecular biology andgenetics, has opened up new avenues for altering the production ofcompounds of economic importance by plants. Scientists have focused onutilizing recombinant DNA (rDNA) methods, that allow new varieties ofplants to be produced much faster than by conventional breeding. rDNAtechniques allow the introduction of genes from distantly relatedspecies or even from different biological kingdoms into crop plants,conferring traits that provide significant agronomic advantages.Furthermore, detailed knowledge of the traits being introduced, such ascellular function and localization, can lead to less variability inoffspring, and fine-tuning of secondary effects (e.g., permittingvariation from what is customarily observed). After a trait has beenintroduced into a plant by transgenic methods, conventional breeding canbe used to hybridize the transgenic line with useful varieties and elitegermplasms, resulting in crops containing numerous advantageousproperties.

Most efforts to engineer plants with specific traits thus far have beenbased on the rational design paradigm of transforming a plant with agene of known function with the intent of introducing a known trait. Asagricultural biotechnology hurtles into the genomics and post-genomicsera, the massive amounts of genetic and functional data being generatedare being used to direct the search for genes that can be utilized withrecombinant methods. However, if the use of this information is limitedto the rational design paradigm, the identification of genes with trulyprofound effects on the production of desired compounds by plants couldbe extremely time-consuming and slow.

Accordingly, what is needed in the art are methods for rapidly screeningand identifying gene sequences and polypeptide sequences of previouslyunknown function whose expression causes altered metaboliccharacteristics in biological systems, including, but not limited to,plants.

SUMMARY OF THE INVENTION

This invention relates to deoxyribonucleic acid (DNA) and amino acidsequences that confer altered metabolic characteristics in plants. Insome embodiments, the present invention provides polynucleotides andpolypeptides that confer altered metabolic characteristics whenexpressed in plants. The present invention is not limited to thealteration of amounts or levels of any particular metabolite. Indeed,the alteration of the levels or amounts of a variety of metabolites iscontemplated, including, but not limited to acids, fatty acids, aminoacids, hydroxy fatty acids, branched fatty acids, carbohydrates,hydrocarbons, glycerides, phenols, strerols, oxygenated terpenes, andother isoprenoids, alcohols, alkenes and alkynes. The present inventionis not limited to any particular polypeptide or polynucleotide sequencesthat confer altered metabolic characteristics. Indeed, a variety of suchsequences are contemplated. Accordingly, in some embodiments the presentinvention provides an isolated nucleic acid selected from the groupconsisting of SEQ ID NOs: 1-7554 and nucleic acid sequences thathybridize to any thereof under conditions of low stringency, whereinexpression of the isolated nucleic acid in a plant results in a alteredmetabolic characteristic.

In some embodiments, the present invention provides an isolated nucleicacid selected from SEQ ID NOs: 162, 212, 3781, 3970, 3990, 492, 3796,3975, and 4028, wherein expression of the nucleic acid in a plantresults in altered acid metabolism. In other embodiments, the presentinvention provides an isolated nucleic acid selected from SEQ ID NOs:4049, 210, 4045, 229, 3825, 4015, 3835, 4039, 1048 and 1106, whereinexpression the nucleic acid in a plant results in altered alcoholmetabolism. In still other embodiments, the present invention providesan isolated nucleic acid selected from SEQ ID NOs: 7548, 283, 3957,3734, 3739, 3797, 7516, 3762, 4020 and 1062, wherein expression of thenucleic acid in a plant results in altered fatty acid metabolism. Infurther embodiments, the present invention provides an isolated nucleicacid of selected from SEQ ID NOs: 1148, 4147, 273, 281, 299, 3920, 450,7463 and 4074, wherein expression of the nucleic acid in a plant resultsin altered branched fatty acid metabolism.

In still further embodiments, the present invention provides an isolatednucleic acid selected from SEQ ID NOs: 258, 456, 3859, 3817, 4018, 3848,3862, 4008 and 1000, wherein expression of the nucleic acid in a plantresults in altered alkaloid or other base metabolism. In someembodiments, the present invention provides an isolated nucleic acidselected from SEQ ID NOs: 372, 3714, 3717, 3963, 3775, 3757, 7462, 3743,3744 and 7480, wherein expression of the nucleic acid in a plant resultsin altered amino acid metabolism.

In some other embodiments, the present invention provides an isolatednucleic acid selected from SEQ ID NOs: 7404, 180, 181, 225, 231, 366,3983, 3833, 1121 and 1062, wherein expression of the nucleic acid in aplant results in altered ester metabolism. In some further embodiments,the present invention provides an isolated nucleic acid selected fromSEQ ID NOs: 3773, 583, 3821, 7403, 988, 1002, 1007 and 1129, whereinexpression of the nucleic acid in a plant results in altered glyceridemetabolism. In still other embodiments, the present invention providesan isolated nucleic acid selected from SEQ ID NOs: 150, 7410, 175, 7553,619, 1078, 1122 and 1124, wherein expression of the nucleic acid in aplant results in altered phenolic compound metabolism.

In further embodiments, the present invention provides an isolatednucleic acid selected from SEQ ID NOs: 3891, 7545, 7551, 4121, 157, 159,7411, 3792, 3799 and 3997, wherein expression of the nucleic acid in aplant results in altered carbohydrate metabolism. In other embodiments,the present invention provides an isolated nucleic acid selected fromSEQ ID NOs: 7405, 7406, 173, 183, 220, 227, 3778, 3803, 3847 and 1005,wherein expression of the nucleic acid in a plant results in alteredsterol, oxygenated terpene, or isoprenoid metabolism. In still otherembodiments, the present invention provides an isolated nucleic acidselected from SEQ ID NOs: 7408, 351, 378, 3864, 4103, 996, 1006 and1098, wherein expression of the nucleic acid in a plant results inaltered alkene or alkyne metabolism.

In further embodiments, the present invention provides an isolatednucleic acid selected from SEQ ID NOs: 177, 7442, 4038, 3836, 3855,1012, 1015, 1119 and 1024, wherein expression of the nucleic acid in aplant results in altered hydrocarbon metabolism. In still furtherembodiments, the present invention provides an isolated nucleic acidselected from SEQ ID NOs: 360, 4001, 3703, 7399, 645, 3849 and 7552,wherein expression of the nucleic acid in a plant results in alteredketone or quinone metabolism.

In further preferred embodiments, the present invention provides vectorscomprising the foregoing polynucleotide sequences. In still furtherembodiments, the foregoing sequences are operably linked to an exogenouspromoter, most preferably a plant promoter. However, the presentinvention is not limited to the use of any particular promoter. Indeed,the use of a variety of promoters is contemplated, including, but notlimited to, 35S, 19S, heat shock, and Rubisco promoters, subgenomicpromoters such as the CaMV promoter and TMV coat protein promoter, anddual promoters systems such as DHSPES (see U.S. Pat. No. 6,303,848,incorporated herein by reference). In some embodiments, the nucleic acidsequences of the present invention are arranged in sense orientation,while in other embodiments, the nucleic acid sequences are arranged inthe vector in antisense orientation. In still further embodiments, thepresent invention provides a plant comprising one of the foregoingnucleic acid sequences or vectors, as well as seeds, leaves, roots,stems and fruit from the plant. In some particularly preferredembodiments, the present invention provides at least one of theforegoing sequences for use in conferring altered metabolism in a plant.

In still other embodiments, the present invention provides processes formaking a transgenic plant comprising providing a vector as describedabove and a plant, and transfecting the plant with the vector. In otherpreferred embodiments, the present invention provides processes forproviding an altered metabolic characteristic in a plant or populationof plants comprising providing a vector as described above and a plant,and transfecting the plant with the vector under conditions such that analtered metabolic characteristic is conferred by expression of theisolated nucleic acid from the vector. In still further embodiments, thepresent invention provides an isolated nucleic acid selected from thegroup consisting of SEQ ID NOs: 1-7554 and nucleic acid sequences thathybridize to any thereof under conditions of low stringency for use inproducing a plant with altered metabolism. In other embodiments, thepresent invention provides an isolated nucleic acid, composition orvector substantially as described herein in any of the examples orclaims.

DESCRIPTION OF THE DRAWINGS

FIG. 1 presents the contig sequences corresponding to SEQ ID NOs:1-1165,3703-4153, and 7389-7554.

FIG. 2 presents homologous sequences 1166-3702 and 4154-7388.

FIG. 3 is a table of BLAST search results from public databases.

FIG. 4 is a table of BLAST search results from the Derwent™ amino aciddatabase.

FIG. 5 is a table of BLAST search results from the Derwent™ nucleotidedatabase.

FIG. 6 provides a summary of the metabolic alterations caused byexpression of the indicated sequences. NQ=present in reference, but notdetected or below the limit of quantification in the sample.

FIGS. 7 a-d summarizes the gas chromatography flame ionization detection(GC/FD) parameters used to analyze metabolite samples.

FIG. 8 provides a list of SEQ IDs, grouped by the respectivefractionation chemistry, observed to confer altered metaboliccharacteristic in plants based exclusively upon a pattern recognitionautomated data analysis technique (ADA).

FIG. 9 provides tables exemplifying the functional correlation betweensequences that share homology.

DEFINITIONS

Before the present proteins (including their fragments and peptides),nucleotide sequences, and methods are described, it should be noted thatthis invention is not limited to the particular methodology, protocols,cell lines, vectors, and reagents described herein as these may vary. Itshould also be understood that the terminology used herein is for thepurpose of describing particular aspects of the invention, and is notintended to limit its scope.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural reference unless thecontext clearly dictates otherwise. Thus, for example, reference to “ahost cell” includes a plurality of such host cells, reference to the“antibody” is a reference to one or more antibodies and equivalentsthereof known to those skilled in the art, and so forth.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice or testing of the present invention, the preferred methods,devices, and materials are now described. All publications mentionedherein are incorporated herein by reference for the purpose ofdescribing and disclosing the cell lines, vectors, and methodologiesthat are reported in the publications that might be used in connectionwith the invention. Nothing herein is to be construed as an admissionthat the invention is not entitled to antedate such disclosure by virtueof prior invention.

“Acylate”, as used herein, refers to the introduction of an acyl groupinto a molecule, (for example, acylation).

“Adjacent”, as used herein, refers to a position in a nucleotidesequence immediately 5′ or 3′ to a defined sequence.

“Agonist”, as used herein, refers to a molecule that, when bound to apolypeptide (for example, a polypeptide encoded by a nucleic acid of thepresent invention), increases the biological or immunological activityof the polypeptide. Agonists may include proteins, nucleic acids,carbohydrates, or any other molecules that bind to the protein.

“Allele” or “allelic sequence”, as used herein, refers an alternativeform of the gene that may result from at least one mutation in thenucleic acid sequence.

“Altered”, as used herein, refers to modification in the metabolicprofile compared to a reference or control where the amount ofbiochemical and/or chemical compound is increased or decreased.

“Alterations” in a polynucleotide (for example, a polypeptide encoded bya nucleic acid of the present invention), as used herein, comprise anydeletions, insertions, and point mutations in the polynucleotidesequence. Included within this definition are alterations to the genomicDNA sequence that encodes the polypeptide.

“Amino acid sequence”, as used herein, refers to an oligopeptide,peptide, polypeptide, or protein sequence, and fragments or portionsthereof, and to naturally occurring or synthetic molecules. “Amino acidsequence” and like terms, such as “polypeptide” or “protein” as recitedherein are not meant to limit the amino acid sequence to the complete,native amino acid sequence associated with the recited protein molecule.

“Amplification”, as used herein, refers to the production of additionalcopies of a nucleic acid sequence and is generally carried out usingpolymerase chain reaction (PCR) technologies well known in the art(Dieffenbach, C. W. and G. S. Dveksler (1995) PCR Primer, a LaboratoryManual, Cold Spring Harbor Press, Plainview, N.Y.).

“Antibody” refers to intact molecules as well as fragments thereof thatare capable of specific binding to a epitopic determinant. Antibodiesthat bind a polypeptide (for example, a polypeptide encoded by a nucleicacid of the present invention) can be prepared using intact polypeptidesor fragments as the immunizing antigen. These antigens may be conjugatedto a carrier protein, if desired.

“Antigenic determinant”, “determinant group”, or “epitope of anantigenic macromolecule”, as used herein, refer to any region of themacromolecule with the ability or potential to elicit, and combine with,specific antibody. Determinants exposed on the surface of themacromolecule are likely to be immunodominant, that is, more immunogenicthan other (immunorecessive) determinants that are less exposed, whilesome (for example, those within the molecule) are non-immunogenic(immunosilent). As used herein, “antigenic determinant” refers to thatportion of a molecule that makes contact with a particular antibody (forexample, an epitope). When a protein or fragment of a protein is used toimmunize a host animal, numerous regions of the protein may induce theproduction of antibodies that bind specifically to a given region orthree-dimensional structure on the protein; these regions or structuresare referred to as antigenic determinants. An antigenic determinant maycompete with the intact antigen (the immunogen used to elicit the immuneresponse) for binding to an antibody.

“Antisense”, as used herein, refers to a deoxyribonucleotide sequencewhose sequence of deoxyribonucleotide residues is in reverse 5′ to 3′orientation in relation to the sequence of deoxyribonucleotide residuesin a sense strand of a DNA duplex. A “sense strand” of a DNA duplexrefers to a strand in a DNA duplex that is transcribed by a cell in itsnatural state into a “sense mRNA”. Thus an “antisense” sequence is asequence having the same sequence as the non-coding strand in a DNAduplex. The term “antisense RNA” refers to a RNA transcript that iscomplementary to all or part of a target primary transcript or mRNA andthat blocks the expression of a target gene by interfering with theprocessing, transport and/or translation of its primary transcript ormRNA. The complementarity of an antisense RNA may be with any part ofthe specific gene transcript, for example, at the 5′ non-codingsequence, 3′ non-coding sequence, introns, or the coding sequence. Inaddition, as used herein, antisense RNA may contain regions of ribozymesequences that increase the efficacy of antisense RNA to block geneexpression. “Ribozyme” refers to a catalytic RNA and includessequence-specific endoribonucleases.

“Anti-sense inhibition”, as used herein, refers to a type of generegulation based on cytoplasmic, nuclear, or organelle inhibition ofgene expression due to the presence in a cell of an RNA moleculecomplementary to at least a portion of the mRNA being translated. It isspecifically contemplated that DNA molecules may be from either an RNAvirus or mRNA from the host cell genome or from a DNA virus.

“Antagonist” or “inhibitor”, as used herein, refer to a molecule that,when bound to a polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), decreases the biological orimmunological activity of the polypeptide. Antagonists and inhibitorsmay include proteins, nucleic acids, carbohydrates, or any othermolecules that bind to the polypeptide.

“Biologically active”, as used herein, refers to a molecule having thestructural, regulatory, or biochemical functions of a naturallyoccurring molecule.

“Biological material”, as used herein, refers to: a portion or portionsof one or more cells, organs, or organisms; a whole cell, organelle,organ, or organism; or a group of cells, organelles, organs, ororganisms. For example, if the organism(s) supplying the biologicalmaterial is a garden variety carrot, a single leaf of one carrot plantcould be used, or one or more whole carrot plant(s) could be used, orpartial or whole taproots from a number of different individuals couldbe used, or mitochondria extracted from the crown of one carrot plantcould be used.

“Cell culture”, as used herein, refers to a proliferating mass of cellsthat may be in either an undifferentiated or differentiated state.

“Chimeric plasmid”, as used herein, refers to any recombinant plasmidformed (by cloning techniques) from nucleic acids derived from organismsthat do not normally exchange genetic information (for example,Escherichia coli and Saccharomyces cerevisiae).

“Chimeric sequence” or “chimeric gene”, as used herein, refer to anucleotide sequence derived from at least two heterologous parts. Thesequence may comprise DNA or RNA.

“Chromatogram”, as used herein, refers to an electronic and/or graphicrecord of data representing the absolutely or relatively quantitativedetection of a plurality of separated chemical species obtained orderived from a group of metabolites, whether or not such separation hasbeen performed by chromatography or some other method (e.g.,electrophoresis).

“Control chromatogram”, as used herein, refers to an individualchromatogram, or an average chromatograrn based on multiple individualchromatograms or a mathematical model based on multiple individualchromatograms, of chemical species obtained from a group of metabolitesextracted from “control” biological material.

“Subject chromatogram”, as used herein, refers to an individualchromatogram, or an average or model chromatogram based on multipleindividual chromatograms, of chemical species obtained from a group ofmetabolites extracted from “subject” biological material. In eithercase, a model chromatogram may contain data including, e.g.: peakmigration distance (or elution time) ranges and averages; peak heightand peak area ranges and averages; and other parameters.

“Chromatographic data”, as used herein, refers to chromatograms (e.g.,including, but not limited to, total ion chromatograms or chromatogramsgenerated from flame ionization detection) corresponding to individualbiological or reference samples. Data such as retention time, retentionindex, peak areas, and peak areas normalized to internal standards canbe extracted from total ion chromatograms to generate “peak tables.”

“Coding sequence”, as used herein, refers to a deoxyribonucleotidesequence that, when transcribed and translated, results in the formationof a cellular polypeptide or a ribonucleotide sequence that, whentranslated, results in the formation of a cellular polypeptide.

“Compatible”, as used herein, refers to the capability of operating withother components of a system. A vector or plant viral nucleic acid thatis compatible with a host is one that is capable of replicating in thathost. A coat protein that is compatible with a viral nucleotide sequenceis one capable of encapsidating that viral sequence.

“Coding region”, as used herein, refers to that portion of a gene thatcodes for a protein. The term “non-coding region” refers to that portionof a gene that is not a coding region.

“Complementary” or “complementarity”, as used herein, refer to theWatson-Crick base-pairing of two nucleic acid sequences. For example,for the sequence 5′-AGT-3′ binds to the complementary sequence3′-TCA-5′. Complementarity between two nucleic acid sequences may be“partial”, in which only some of the bases bind to their complement, orit may be complete as when every base in the sequence binds to it'scomplementary base. The degree of complementarity between nucleic acidstrands has significant effects on the efficiency and strength ofhybridization between nucleic acid strands.

“Contig”, as used herein, refers to a nucleic acid sequence that isderived from the contiguous assembly of two or more nucleic acidsequences.

“Control biological material” and “subject biological material”, as usedherein, both refer to biological material taken from(cultivated/domesticated or uncultivated/non-domesticated wild-type orgenetically modified) individual(s) of any taxonomic category orcategories, i.e. kingdom, phylum, subphylum, class, subclass, order,suborder, family, subfamily, genus, subgenus, species, subspecies,variety, breed, or strain. The “control” and “subject” biologicalmaterial may be, and typically are, taken from individual(s) of the sametaxonomic category, preferably from the same species, subspecies,variety, breed, or strain. However, when comparison between differenttypes of organisms is desired, the “control” and “subject” biologicalmaterial may be taken from individual(s) of different taxonomiccategories. The “control” and “subject” biological materials differ fromeach other in at least one way. This difference may be that the“control” and “subject” biological materials were obtained fromindividual(s) of different taxonomic categories. Alternatively, oradditionally, they may be different parts of the same organ(s), they maybe different organelles or different groups of organelles, differentcells or different groups of cells, different organs or different groupsof organs, or different whole organisms or different groups of wholeorganisms. The difference may be that the organisms providing thebiological materials are identical, but for, e.g., their growth stages.

“Correlates with expression of a polynucleotide”, as used herein,indicates that the detection of the presence of ribonucleic acid that issimilar to a nucleic acid (for example, SEQ ID NOs:1-7554) and isindicative of the presence of mRNA encoding a polypeptide (for example,a polypeptide encoded by a nucleic acid of the present invention) in asample and thereby correlates with expression of the transcript from thepolynucleotide encoding the protein.

“Customized reporting”, as used herein, refers to the modification of apreliminary analyst report to generate an interim report (e.g.,including, but not limited to, a modified analyst report and across-referenced modified analyst report) and a final report. In someembodiments, modifications include, but are not limited to, substitutionof underivatized compound names for derivatized compound names andgeneration of a hit score. In other embodiments, customized reportingincludes data mining of databases to generate biochemical profiling andgenetic expression information and/or reports.

“Data analysis and reporting software”, as used herein, refers tosoftware configured for the analysis of spectroscopic andchromatographic data corresponding to biological subject and referencesamples. Data analysis and reporting software is configured to performdata reduction, two-dimensional peak matching, quantitative peakdifferentiation, peak identification, querying, data mining, andcustomized reporting functions.

“Data reduction”, as used herein, refers to the process of organizing,compiling, and normalizing data (for example, chromatographic andspectroscopic data). In some embodiments, data reduction includes thenormalization of raw chromatogram peak areas and the generation of peaktables. In some embodiments, data reduction also includes the process offiltering peaks based on their normalized area. This step removes peaksthat are considered to be background.

“Data sorting”, as used herein, refers to the generation of apreliminary analyst report. In some embodiment, the preliminary analystreport can include equivalence value, retention time, retention index,normalized peak area, peak identification status, compound name or otherunique identifier, compound identification number (e.g., a CAS number),mass spectral library name, ID number, MS-XCR value, relative % change,notes, and other information about the biological sample.

“Data mining”, as used herein, refers to the process of querying andmining databases to analyze and to obtain information (e.g., to use inthe generation of customized reports of information pertaining tobiochemical profiling and gene function and expression).

“Deletion”, as used herein, refers to a change made in either an aminoacid or nucleotide sequence resulting in the absence of one or moreamino acids or nucleotides, respectively.

“Encapsidation”, as used herein, refers to the process during virionassembly in which nucleic acid becomes incorporated in the viral capsidor in a head/capsid precursor (for example, in certain bacteriophages).

“Exon”, as used herein, refers to a polynucleotide sequence in a nucleicacid that encodes information for protein synthesis and that is copiedand spliced together with other such sequences to form messenger RNA.

“Expression”, as used herein, is meant to incorporate transcription,reverse transcription, and translation.

“Expressed sequence tag (EST)” as used herein, refers to relativelyshort single-pass DNA sequences obtained from one or more ends of cDNAclones and RNA derived therefrom. They may be present in either the 5′or the 3′ orientation. ESTs have been shown to be useful for identifyingparticular genes.

“Fractionated biological sample”, as used herein, refers to a biologicalsample that has been fractionated into two or more fractions based onone or more properties of the sample. For example, in some embodiments,leaf extracts are fractionated based on extraction with organicsolvents.

“Industrial crop”, as used herein, refers to crops grown primarily forconsumption by humans or animals or use in industrial processes (forexample, as a source of fatty acids for manufacturing or sugars forproducing alcohol). It will be understood that either the plant or aproduct produced from the plant (for example, sweeteners, oil, flour, ormeal) can be consumed. Examples of food crops include, but are notlimited to, corn, soybean, rice, wheat, oilseed rape, cotton, oats,barley, and potato plants.

“Foreign gene”, as used herein, refers to any sequence that is notnative to the organism.

“Fusion protein”, as used herein, refers to a protein containing aminoacid sequences from each of two distinct proteins; it is formed by theexpression of a recombinant gene in which two coding sequences have beenjoined together such that their reading frames are in phase. Hybridgenes of this type may be constructed in vitro in order to label theproduct of a particular gene with a protein that can be more readilyassayed (for example, a gene fused with lacZ in E. coli to obtain afusion protein with β-galactosidase activity). Alternatively, a proteinmay be linked to a signal peptide to allow its secretion by the cell.The products of certain viral oncogenes are fusion proteins.

“Gene”, as used herein, refers to a discrete nucleic acid sequenceresponsible for a discrete cellular product. The term “gene”, as usedherein, refers not only to the nucleotide sequence encoding a specificprotein, but also to any adjacent 5′ and 3′ non-coding nucleotidesequence involved in the regulation of expression of the protein encodedby the gene of interest. These non-coding sequences include terminatorsequences, promoter sequences, upstream activator sequences, regulatoryprotein binding sequences, and the like. These non-coding sequence generegions may be readily identified by comparison with previouslyidentified eukaryotic non-coding sequence gene regions. Furthermore, theperson of average skill in the art of molecular biology is able toidentify the nucleotide sequences forming the non-coding regions of agene using well-known techniques such as a site-directed mutagenesis,sequential deletion, promoter probe vectors, and the like.

“Genetically modified” and “genetically unmodified” when used inrelation to subject biological material and control biological material,respectively, refer to the fact that the subject biological material hasbeen treated to produce a genetic modification thereof, whereas thecontrol biological material has not received that particular geneticmodification. In this context, the term “genetically unmodified” doesnot imply that the “control” biological material must be, e.g., anaturally-occurring, wild-type plant; rather, both the control andsubject biological materials may be (but need not be) the result of,e.g., hybridization, selection, or genetic engineering.

“Growth cycle”, as used herein, is meant to include the replication of anucleus, an organelle, a cell, or an organism.

“Heterologous”, as used herein, refers to the association of a molecularor genetic element associated with a distinctly different type ofmolecular or genetic element.

“HIT” and/or “hit”, as used herein, refers to the result of a test orseries of tests that meets a defined criteria for each test. “HitDetection”, as used herein, refers to the process of determing a hitusing a mathemetical or statistical model.

“Host”, as used herein, refers to a cell, tissue or organism capable ofreplicating a vector or plant viral nucleic acid and that is capable ofbeing infected by a virus containing the viral vector or plant viralnucleic acid. This term is intended to include prokaryotic andeukaryotic cells, organs, tissues or organisms, where appropriate.

The term “homolog” as in a “homolog” of a given nucleic acid sequence,as used herein, refers to a nucleic acid sequence (for example, anucleic acid sequence from another organism), that shares a given degreeof “homology” with the nucleic acid sequence.

“Homology”, as used herein, refers to a degree of complementarity. Theremay be partial homology or complete homology (identity). A partiallycomplementary sequence is one that at least partially inhibits acompletely complementary sequence from hybridizing to a target nucleicacid and is referred to using the functional term “substantiallyhomologous”. The inhibition of hybridization of the completelycomplementary sequence to the target sequence may be examined using ahybridization assay (Southern or Northern blot, solution hybridizationand the like) under conditions of low stringency. A substantiallyhomologous sequence or probe will compete for and inhibit the binding(the hybridization) of a completely homologous sequence to a targetunder conditions of low stringency. This is not to say that conditionsof low stringency are such that non-specific binding is permitted; lowstringency conditions require that the binding of two sequences to oneanother be a specific (selective) interaction. The absence ofnon-specific binding may be tested by the use of a second target thatlacks even a partial degree of complementarity (for example, less thanabout 30% identity); in the absence of non-specific binding the probewill not hybridize to the second non-complementary target.

Numerous equivalent conditions may be employed to comprise lowstringency conditions; factors such as the length and nature (DNA, RNA,base composition) of the probe and nature of the target (DNA, RNA, basecomposition, present in solution or immobilized, etc.) and theconcentration of the salts and other components (for example, thepresence or absence of formamide, dextran sulfate, polyethylene glycol)are considered and the hybridization solution may be varied to generateconditions of low stringency hybridization different from, butequivalent to, the above listed conditions. In addition, the art knowsconditions that promote hybridization under conditions of highstringency (for example, increasing the temperature of the hybridizationand/or wash steps, the use of formamide in the hybridization solution,etc.).

When used in reference to a double-stranded nucleic acid sequence suchas a cDNA or genomic clone, the term “substantially homologous” refersto any probe that can hybridize to either or both strands of thedouble-stranded nucleic acid sequence under conditions of low stringencyas described above.

A gene may produce multiple RNA species that are generated bydifferential splicing of the primary RNA transcript. cDNAs that aresplice variants of the same gene will contain regions of sequenceidentity or complete homology (representing the presence of the sameexon or portion of the same exon on both cDNAs) and regions of completenon-identity (for example, representing the presence of exon “A” on cDNA1 wherein cDNA 2 contains exon “B” instead). Because the two cDNAscontain regions of sequence identity they will both hybridize to a probederived from the entire gene or portions of the gene containingsequences found on both cDNAs; the two splice variants are thereforesubstantially homologous to such a probe and to each other.

When used in reference to a single-stranded nucleic acid sequence, theterm “substantially homologous” refers to any probe that can hybridize(it is the complement of) the single-stranded nucleic acid sequenceunder conditions of low stringency as described above.

The term “hybridization” is used in reference to the pairing ofcomplementary nucleic acids. Hybridization and the strength ofhybridization (for example, the strength of the association between thenucleic acids) is impacted by such factors as the degree ofcomplementary between the nucleic acids, stringency of the conditionsinvolved, the T_(m) of the formed hybrid, and the G:C ratio within thenucleic acids.

“Hybridization complex”, as used herein, refers to a complex formedbetween nucleic acid strands by virtue of hydrogen bonding, stacking orother non-covalent interactions between bases. A hybridization complexmay be formed in solution or between nucleic acid sequences present insolution and nucleic acid sequences immobilized on a solid support (forexample, membranes, filters, chips, pins or glass slides to which cellshave been fixed for in situ hybridization).

“Immunologically active”, as used herein, refers to the capability of anatural, recombinant, or synthetic polypeptide, or any oligopeptidethereof, to bind with specific antibodies and induce a specific immuneresponse in appropriate animals or cells.

“Induction” and the terms “induce”, “induction” and “inducible”, as usedherein, refer generally to a gene and a promoter operably linked theretowhich is in some manner dependent upon an external stimulus, such as amolecule, in order to actively transcribed and/or translate the gene.

“Infection”, as used herein, refers to the ability of a virus totransfer its nucleic acid to a host or introduce viral nucleic acid intoa host, wherein the viral nucleic acid is replicated, viral proteins aresynthesized, and new viral particles assembled. In this context, theterms “transmissible” and “infective” are used interchangeably herein.

“Insertion” or “addition”, as used herein, refers to the replacement oraddition of one or more nucleotides or amino acids, to a nucleotide oramino acid sequence, respectively.

“In cis”, as used herein, indicates that two sequences are positioned onthe same strand of RNA or DNA.

“In trans”, as used herein, indicates that two sequences are positionedon different strands of RNA or DNA.

“Intron”, as used herein, refers to a polynucleotide sequence in anucleic acid that does not encode information for protein synthesis andis removed before translation of messenger RNA.

“Isolated”, as used herein, refers to a polypeptide or polynucleotidemolecule separated not only from other peptides, DNAs, or RNAs,respectively, that are present in the natural source of themacromolecule. “Isolated” and “purified” do not encompass either naturalmaterials in their native state or natural materials that have beenseparated into components (for example, in an acrylamide gel) but notobtained either as pure substances or as solutions.

“Kinase”, as used herein, refers to an enzyme (for example, hexokinaseand pyruvate kinase) that catalyzes the transfer of a phosphate groupfrom one substrate (commonly ATP) to another.

“Marker” or “genetic marker”, as used herein, refer to a genetic locusthat is associated with a particular, usually readily detectable,genotype or phenotypic characteristic (for example, an antibioticresistance gene).

“Metabolic characteristics”, as used herein, refers to abiochemical/chemical trait/metabolite that is genetically expressed in abiological system. “Altered metabolic characteristic”, as used herein,refers to the production of a given metabolite that has been altered(for example, increased or decreased) in a biological system, especiallyplants. Examples of metabolites that can be altered in a plant include,but are not limited to, acids, fatty acids, amino acids, hydroxy fattyacids, branched fatty acids, carbohydrates, hydrocarbons, glycerides,phenols, strerols, oxygenated terpenes, and other isoprenoids, alcohols,ketones, quinones, alkenes and alkynes.

“Metabolome”, as used herein, indicates the complement of relatively lowmolecular weight molecules that is present in a plant, plant part, orplant sample, or in a suspension or extract thereof. Examples of suchmolecules include, but are not limited to: acids and related compounds;mono-, di-,and tri-carboxylic acids (saturated, unsaturated, aliphaticand cyclic, aryl, alkaryl); aldo-acids, keto-acids; lactone forms;gibberellins; abscisic acid; alcohols, polyols, derivatives, and relatedcompounds; ethyl alcohol, benzyl alcohol, methanol; propylene glycol,glycerol, phytol; inositol, furfuryl alcohol, menthol; aldehydes,ketones, quinones, derivatives, and related compounds; acetaldehyde,butyraldehyde, benzaldehyde, acrolein, furfural, glyoxal; acetone,butanone; anthraquinone; carbohydrates; mono-, di-, tri-saccharides;alkaloids, amines, and other bases; pyridines (including nicotinic acid,nicotinamide); pyrimidines (including cytidine, thymine); purines(including guanine, adenine, xanthines/hypoxanthines, kinetin);pyrroles; quinolines (including isoquinolines); morphinans, tropanes,cinchonans; nucleotides, oligonucleotides, derivatives, and relatedcompounds; guanosine, cytosine, adenosine, thymidine, inosine; aminoacids, oligopeptides, derivatives, and related compounds; esters;phenols and related compounds; heterocyclic compounds and derivatives;pyrroles, tetrapyrroles (corrinoids and porphines/porphyrins, w/w/ometal-ion); flavonoids; indoles; lipids (including fatty acids andtriglycerides), derivatives, and related compounds; carotenoids,phytoene; and sterols, isoprenoids including terpenes.

“Modulate”, as used herein, refers to a change or an alteration in thebiological activity of a polypeptide (for example, a polypeptide encodedby a nucleic acid of the present invention). Modulation may be anincrease or a decrease in protein activity, a change in bindingcharacteristics, or any other change in the biological, functional orimmunological properties of the polypeptide.

“Movement protein”, as used herein, refers to a noncapsid proteinrequired for cell to cell movement of replicons or viruses in plants.

“Multigene family”, as used herein, refers to a set of genes descendedby duplication and variation from some ancestral gene. Such genes may beclustered together on the same chromosome or dispersed on differentchromosomes. Examples of multigene families include those that encodethe histones, hemoglobins, immunoglobulins, histocompatibility antigens,actins, tubulins, keratins, collagens, heat shock proteins, salivaryglue proteins, chorion proteins, cuticle proteins, yolk proteins, andphaseolins.

“Non-native”, as used herein, refers to any RNA sequence that promotesproduction of subgenomic mRNA including, but not limited to, 1) plantviral promoters such as ORSV and brome mosaic virus, 2) viral promotersfrom other organisms such as human Sindbis viral promoter, and 3)synthetic promoters.

“Nucleic acid sequence”, as used herein, refers to a polymer ofnucleotides in which the 3′ position of one nucleotide sugar is linkedto the 5′ position of the next by a phosphodiester bridge. In a linearnucleic acid strand, one end typically has a free 5′ phosphate group,the other a free 3′ hydroxyl group. Nucleic acid sequences may be usedherein to refer to oligonucleotides, or polynucleotides, and fragmentsor portions thereof, and to DNA or RNA of genomic or synthetic originthat may be single- or double-stranded, and represent the sense orantisense strand.

“Polypeptide”, as used herein, refers to an amino acid sequence obtainedfrom any species and from any source whether natural, synthetic,semi-synthetic, or recombinant.

“Principal component analysis”, as used herein, refers to algorithmsdesigned to represent large and complex data sets by linear combinationsof the original variables. These linear combinations of variables areextracted to maximize the explained variance and are mutuallyorthogonal. Principal component analysis is designed to represent largecomplex data sets by linear combinations of the original variables thatmaximize the explained variance and are mutually orthogonal.

“Oil-producing species”, as used herein, refers to plant species thatproduce and store triacylglycerol in specific organs, primarily inseeds. Such species include soybean (Glycine max), rapeseed and canola(including Brassica napus and B. campestris), sunflower (Helianthusannus), cotton (Gossypium hirsutum), corn (Zea mays), cocoa (Theobromacacao), safflower (Carthamus tinctorius), oil palm (Elaeis guineensis),coconut palm (Cocos nucifera), flax (Linum usitatissimum), castor(Ricinus communis) and peanut (Arachis hypogaea). The group alsoincludes non-agronomic species that are useful in developing appropriateexpression vectors such as tobacco, rapid cycling Brassica species, andArabidopsis thaliana, and wild species that may be a source of uniquefatty acids.

“Operably linked”, as used herein, refers to a juxtaposition ofcomponents, particularly nucleotide sequences, such that the normalfunction of the components can be performed. Thus, a coding sequencethat is operably linked to regulatory sequences refers to aconfiguration of nucleotide sequences wherein the coding sequences canbe expressed under the regulatory control, that is, transcriptionaland/or translational control, of the regulatory sequences.

“Origin of assembly”, as used herein, refers to a sequence whereself-assembly of the viral RNA and the viral capsid protein initiates toform virions.

“Ortholog”, as used herein, refers to genes that have evolved from anancestral locus.

“Outlier peak”, as used herein, indicates a peak of a chromatogram of atest sample, or the relative or absolute detected response data, oramount or concentration data thereof. An outlier peak: 1) may have asignificantly different peak height or area as compared to a likechromatogram of a control sample; or 2) be an additional or missing peakas compared to a like chromatogram of a control sample.

“Overexpression”, as used herein, refers to the production of a geneproduct in transgenic organisms that exceeds levels of production innormal or non-transformed organisms.

“Cosuppression”, as used herein, refers to the expression of a foreigngene that has substantial homology to an endogenous gene resulting inthe suppression of expression of both the foreign and the endogenousgene. As used herein, the term “altered levels” refers to the productionof gene product(s) in transgenic organisms in amounts or portions thatdiffer from that of normal or non-transformed organisms.

“Peak identification”, as used herein, refers to the characterizationand identification of a chemical compound represented by a givenchromatographic peak. In some embodiments, the chemical compoundcorresponding to a given peak is identified by searching mass spectrallibraries. In other embodiments, the chemical compounds are identifiedby searching additional libraries or databases (for example,biotechnology databases).

“Quantitative peak differentiation”, as used herein, refers to theprocess of confirming matched peaks by calculating their relativequantitative differentiation, which is expressed as a percent change ofthe subject peak area relative to the area of the reference peak. Apredetermined threshold for change is used to confirm that the peaks areof significant biological alteration.

“Plant”, as used herein, refers to any plant and progeny thereof. Theterm also includes parts of plants, including seed, cuttings, tubers,fruit, flowers, etc.

“Plant cell”, as used herein, refers to the structural and physiologicalunit of plants, consisting of a protoplast and the cell wall.

“Plant organ”, as used herein, refers to a distinct and visiblydifferentiated part of a plant, such as root, stem, leaf or embryo.

“Plant tissue”, as used herein, refers to any tissue of a plant inplanta or in culture. This term is intended to include a whole plant,plant cell, plant organ, protoplast, cell culture, or any group of plantcells organized into a structural and functional unit.

“Portion”, as used herein, with regard to a protein (“a portion of agiven protein”) refers to fragments of that protein. The fragments mayrange in size from four amino acid residues to the entire amino acidsequence minus one amino acid (10 nucleotides, 20, 30, 40, 50, 100, 200,etc.). A “portion” is preferably at least 25 nucleotides, morepreferably at least 50 nucleotides, and even more preferably at least100 nucleotides.

“Positive-sense inhibition”, as used herein, refers to a type of generegulation based on cytoplasmic inhibition of gene expression due to thepresence in a cell of an RNA molecule substantially homologous to atleast a portion of the mRNA being translated.

“Production cell”, as used herein, refers to a cell, tissue or organismcapable of replicating a vector or a viral vector, but which is notnecessarily a host to the virus. This term is intended to includeprokaryotic and eukaryotic cells, organs, tissues or organisms, such asbacteria, yeast, fungus, and plant tissue.

“Promoter”, as used herein, refers to the 5′-flanking, non-codingsequence adjacent a coding sequence that is involved in the initiationof transcription of the coding sequence.

“Protoplast”, as used herein, refers to an isolated plant cell withoutcell walls, having the potency for regeneration into cell culture or awhole plant.

“Purified”, as used herein, when referring to a peptide or nucleotidesequence, indicates that the molecule is present in the substantialabsence of other biological macromolecular, for example, polypeptides,polynucleic acids, and the like of the same type. The term “purified” asused herein preferably means at least 95% by weight, more preferably atleast 99.8% by weight, of biological macromolecules of the same typepresent (but water, buffers, and other small molecules, especiallymolecules having a molecular weight of less than 1000 can be present).

“Pure”, as used herein, preferably has the same numerical limits as“purified” immediately above. “Substantially purified”, as used herein,refers to nucleic or amino acid sequences that are removed from theirnatural environment, isolated or separated, and are at least 60% free,preferably 75% free, and most preferably 90% free from other componentswith which they are naturally associated.

“Recombinant plant viral nucleic acid”, as used herein, refers to aplant viral nucleic acid that has been modified to contain non-nativenucleic acid sequences. These non-native nucleic acid sequences may befrom any organism or purely synthetic, however, they may also includenucleic acid sequences naturally occurring in the organism into whichthe recombinant plant viral nucleic acid is to be introduced.

“Recombinant plant virus”, as used herein, refers to a plant viruscontaining a recombinant plant viral nucleic acid.

“Reference sample”, as used herein, refers to a sample taken from anindividual receiving treatment that is not believed to alter thechemistry thereof.

“Regulated”, as used herein, refers to an alteration that occurs in anexpressed metabolite in a biological system. “Up-regulated”, as usedherein, refers to an increase in a give metabolite level relative to acontrol or reference. “Down-regulated”, as used herein, refers to adecrease in a given metabolite level relative to a control or reference.

“Regulatory region” or “regulatory sequence”, as used herein, inreference to a specific gene refers to the non-coding nucleotidesequences within that gene that are necessary or sufficient to providefor the regulated expression of the coding region of a gene. Thus theterm regulatory region includes promoter sequences, regulatory proteinbinding sites, upstream activator sequences, and the like. Specificnucleotides within a regulatory region may serve multiple functions. Forexample, a specific nucleotide may be part of a promoter and participatein the binding of a transcriptional activator protein.

“Replication origin”, as used herein, refers to the minimal terminalsequences in linear viruses that are necessary for viral replication.

“Replicon”, as used herein, refers to an arrangement of RNA sequencesgenerated by transcription of a transgene that is integrated into thehost DNA that is capable of replication in the presence of a helpervirus. A replicon may require sequences in addition to the replicationorigins for efficient replication and stability.

“Sample”, as used herein, is used in its broadest sense. A biologicalsample suspected of containing nucleic acid encoding a polypeptide (forexample, a polypeptide encoded by a nucleic acid of the presentinvention) or fragments thereof may comprise a tissue, a cell, anextract from cells, chromosomes isolated from a cell (for example, aspread of metaphase chromosomes), genomic DNA (in solution or bound to asolid support such as for Southern analysis), RNA (in solution or boundto a solid support such as for northern analysis), cDNA (in solution orbound to a solid support), and the like.

“Site-directed mutagenesis”, as used herein, refers to the in-vitroinduction of mutagenesis at a specific site in a given target nucleicacid molecule.

“Subgenomic promoter”, as used herein, refers to a promoter of asubgenomic mRNA of a viral nucleic acid.

“Subject sample”, as used herein, refers to a sample taken from anindividual that has been treated in order to alter the chemistrythereof.

“T_(m)” is used in reference to the “melting temperature”. The meltingtemperature is the temperature at which a population of double-strandednucleic acid molecules becomes half-dissociated into single strands. Theequation for calculating the T_(m) of nucleic acids is well known in theart. As indicated by standard references, a simple estimate of the T_(m)value may be calculated by the equation: T_(m)=81.5+0.41(% G+C), when anucleic acid is in aqueous solution at 1 M NaCl [See for example,Anderson and Young, Quantitative Filter Hybridization, in Nucleic AcidHybridization (1985)]. Other references include more sophisticatedcomputations that take structural as well as sequence characteristicsinto account for the calculation of T_(m).

“Stringency” is used in reference to the conditions of temperature,ionic strength, and the presence of other compounds such as organicsolvents, under which nucleic acid hybridizations are conducted. Thoseskilled in the art will recognize that “stringency” conditions may bechanged by varying the parameters just described either individually orin concert. With “high stringency” conditions, nucleic acid base pairingwill occur only between nucleic acid fragments that have a highfrequency of complementary base sequences (for example, hybridizationunder “high stringency” conditions may occur between homologs with about85-100% identity, preferably about 70-100% identity). With mediumstringency conditions, nucleic acid base pairing will occur betweennucleic acids with an intermediate frequency of complementary basesequences (for example, hybridization under “medium stringency”conditions may occur between homologs with about 50-70% identity). Thus,conditions of “weak” or “low” stringency are often required with nucleicacids that are derived from organisms that are genetically diverse, asthe frequency of complementary sequences is usually less.

“High stringency conditions” when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 42° C. in a solution consisting of 5×SSPE (43.8 g/l NaCl, 6.9 g/lNaH₂PO₄ H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS,5× Denhardt's reagent and 100 μg/mL denatured salmon sperm DNA followedby washing in a solution comprising 0.1×SSPE, 1.0% SDS at 42° C. when aprobe of about 500 nucleotides in length is employed.

“Medium stringency conditions” when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 42° C. in a solution consisting of 5×SSPE (43.8 g/l NaCl, 6.9 g/lNaH₂PO₄ H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS,5× Denhardt's reagent and 100 μg/mL denatured salmon sperm DNA followedby washing in a solution comprising 1.1×SSPE, 1.0% SDS at 42° C. when aprobe of about 500 nucleotides in length is employed. “Low stringencyconditions” when used in a reference to nucleic acid hybridizationcomprise conditions equivalent to binding or hybridization at 42° C. ina solution consisting of 5×SSPE (43.8 g/l NaCl, 6.9 g/l NaH₂PO₄ H₂O and1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5× Denhardt'sreagent [50× Denhardt's contains per 500 mL: 5 g Ficoll (Type 400,Pharamcia), 5 g BSA (Fraction V; Sigma)] and 100 μg/mL denatured salmonsperm DNA followed by washing in a solution comprising 5×SSPE, 0.1% SDSat 42° C. when a probe of about 500 nucleotides in length is employed.

“Substitution”, as used herein, refers to a change made in an amino acidof nucleotide sequence that results in the replacement of one or moreamino acids or nucleotides by different amino acids or nucleotides,respectively.

“Symptom”, as used herein refers to a visual condition resulting fromthe action of the GENEWARE™ (trademark of Large Scale BiologyCorporation) vector or the clone insert. The GENEWARE™ vector isdescribed in U.S. application Ser. No. 09/008,186 (incorporated hereinby reference).

“Systemic infection”, as used herein, denotes infection throughout asubstantial part of an organism including mechanisms of spread otherthan mere direct cell inoculation but rather including transport fromone infected cell to additional cells either nearby or distant.

“Transcription”, as used herein, refers to the production of an RNAmolecule by RNA polymerase as a complementary copy of a DNA sequence.

“Transformation”, as used herein, describes a process by which exogenousDNA enters and changes a recipient cell. It may occur under natural orartificial conditions using various methods well known in the art.Transformation may rely on any known method for the insertion of foreignnucleic acid sequences into a prokaryotic or eukaryotic host cell. Themethod is selected based on the host cell being transformed and mayinclude, but is not limited to, viral infection, electroporation,lipofection, and particle bombardment. Such “transformed” cells includestably transformed cells in which the inserted DNA is capable ofreplication either as an autonomously replicating plasmid or as part ofthe host chromosome. They also include cells that transiently expressthe inserted DNA or RNA for limited periods of time.

“Transfection”, as used herein, refers to the introduction of foreignnucleic acid into eukaryotic cells. Transfection may be accomplished bya variety of means known to the art including calcium phosphate-DNAco-precipitation, DEAE-dextran-mediated transfection, polybrene-mediatedtransfection, electroporation, microinjection, liposome fusion,lipofection, protoplast fusion, retroviral infection, and biolistics.Transfection may, for example, result in cells in which the insertednucleic acid is capable of replication either as an autonomouslyreplicating molecule or as part of the host chromosome, or cells thattransiently express the inserted nucleic acid for limited periods oftime.

“Transgenic plant”, as used herein, refers to a plant that contains aforeign nucleotide sequence inserted into either its nuclear genome ororganellar genome.

“Transgene”, as used herein, refers to the DNA sequence coding for thereplicon that is inserted into the host DNA.

“Two-dimensional peak matching”, as used herein, refers to the pairingor matching of peaks in reference and subject biological samples. Peaksare first paired based on their retention index. A match is thenconfirmed by spectral matching.

“Unmatched peak”, as used herein, refers to a peak reported in thechromatographic and/or spectroscopic data corresponding to referencebiological sample but missing from chromatographic and/or spectroscopicdata corresponding to subject biological sample, based upon the criteriafor quantitation and reporting, or a peak reported in chromatographicand/or spectroscopic data corresponding to subject biological sample butmissing from chromatographic and/or spectroscopic data corresponding toreference biological sample, based upon criteria for quantitation andreporting.

“Variants” of a polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), as used herein, refers to asequence resulting when a polypeptide is modified by one or more aminoacids. The variant may have “conservative” changes, wherein asubstituted amino acid has similar structural or chemical properties,for example, replacement of leucine with isoleucine. More rarely, avariant may have “nonconservative” changes, for example, replacement ofa glycine with a tryptophan. Variants may also include sequences withamino acid deletions or insertions, or both. Guidance in determiningwhich amino acid residues may be substituted, inserted, or deletedwithout abolishing biological or immunological activity may be foundusing computer programs well known in the art.

“Vector”, as used herein, refers to a self-replicating DNA or RNAmolecule that transfers a nucleic acid segment between cells.

“Virion”, as used herein, refers to a particle composed of viral RNA andviral capsid protein.

“Virus”, as used herein, refers to an infectious agent composed of anucleic acid encapsidated in a protein. A virus may be a mono-, di-,tri- or multi-partite virus.

DESCRIPTION OF THE INVENTION

I. Identification of Nucleotide and Amino Acid Sequences

The invention is based on the discovery of deoxyribonucleic acid (DNA)and amino acid sequences that confer altered metabolic characteristicswhen expressed in plants. In particular, the present inventionencompasses the nucleic acid sequences encoded by SEQ ID NOs:1-1165,3703-4153 and 7389-7554 and variants and portions thereof. Thesesequences are contiguous sequences prepared from a database of 5′ singlepass sequences and are thus referred to as contig sequences.

Nucleic acids of the present invention were identified in clonesgenerated from a variety of cDNA libraries. The cDNA libraries wereconstructed in the GENEWARE™ vector. The GENEWARE™ vector is describedin U.S. application Ser. No. 09/008,186 (incorporated herein byreference). Each of the complete set of clones from the GENEWARE™library was used to prepare an infectious viral unit. An infectious unitcorresponding to each clone was used to inoculate Nicotiana benthamiana(a dicotyledonous plant). The plants were grown under identicalconditions and a phenotypic analysis of each plant was carried out. Thealtered metabolic characteristic was observed in the plants that hadbeen infected by an infectious unit created from the nucleic acids ofthe present invention.

Following the identification of the altered metabolic characteristic inplant samples, further analyses of the sequences were carried out. Inparticular, the nucleotide sequences of the present invention wereanalyzed using bioinformatics methods as described below.

II. Bioinformatics Methods

A. Phred, Phrap and Consed

Phred, Phrap and Consed are a set of programs that read DNA sequencertraces, make base calls, assemble the shotgun DNA sequence data andanalyze the sequence regions that are likely to contribute to errors.Phred is the initial program used to read the sequencer trace data, callthe bases and assign quality values to the bases. Phred uses aFourier-based method to examine the base traces generated by thesequencer. The output files from Phred are written in FASTA, phd or scfformat. Phrap is used to assemble contiguous sequences from only thehighest quality portion of the sequence data output by Phred. Phrap isamenable to high-throughput data collection. Finally, Consed is used asa finishing tool to assign error probabilities to the sequence data.Detailed descriptions of the Phred, Phrap and Consed software and itsuse can be found in the following references: Ewing et al., Genome Res.,8:175 [1998]; Ewing and Green, Genome Res. 8:186 [1998]; Gordon et al.,Genome Res. 8: 195 [1998].

B. BLAST

The BLAST set of programs may be used to compare the large numbers ofsequences and obtain homologies to known protein families. Thesehomologies provide information regarding the function of newly sequencedgenes. Detailed descriptions of the BLAST software and its uses can befound in the following references Altschul et al., J. Mol. Biol.,215:403 [1990]; Altschul, J. Mol. Biol. 219:555 [1991].

Generally, BLAST performs sequence similarity searching and is dividedinto 5 basic subroutines: (1) BLASTP compares an amino acid sequence toa protein sequence database; (2) BLASTS compares a nucleotide sequenceto a nucleic acid sequence database; (3) BLASTX compares translatedprotein sequences done in 6 frames to a protein sequence database; (4)TBLASTN compares a protein sequence to a nucleotide sequence databasethat is translated into all 6 reading frames; (5) TBLASTX compares the 6frame translated protein sequence to the 6-frame translation of anucleotide sequence database. Subroutines (3)-(5) may be used toidentify weak similarities in nucleic acid sequence.

The BLAST program is based on the High Segment Pair (HSP), two sequencefragments of arbitrary but equal length whose alignment is locallymaximized and whose alignment meets or exceeds a cutoff threshold. BLASTdetermines multiple HSP sets statistically using sum statistics. Thescore of the HSP is then related to its expected chance of frequency ofoccurrence, E. The value, E, is dependent on several factors such as thescoring system, residue composition of sequences, length of querysequence and total length of database. In the output file will be listedthese E values, typically in a histogram format, which are useful indetermining levels of statistical significance at the user s predefinedexpectation threshold. Finally, the Smallest Sum Probability, P(N) isthe probability of observing the shown matched sequences by chance aloneand is typically in the range of 0-1.

BLAST measures sequence similarity using a matrix of similarity scoresfor all possible pairs of residues and these specify scores for aligningpairs of amino acids. The matrix of choice for a specific use depends onseveral factors: the length of the query sequence and whether or not aclose or distant relationship between sequences is suspected. Severalmatrices are available including PAM40, PAM120, PAM250, BLOSUM 62 andBLOSUM 50. Altschul et al. (1990) found PAM120 to be the most broadlysensitive matrix (for example point accepted mutation matrix per 100residues). However, in some cases the PAM120 matrix may not find shortbut strong or long but weak similarities between sequences. In thesecases, pairs of PAM matrices may be used, such as PAM40 and PAM 250, andthe results compared. Typically, PAM 40 is used for database searchingwith a query of 9-21 residues long, while PAM 250 is used for lengths of47-123.

The BLOSUM (Blocks Substitution Matrix) series of matrices areconstructed based on percent identity between two sequence segments ofinterest. Thus, the BLOSUM62 matrix is based on a matrix of sequencesegments in which the members are less than 62% identical. BLOSUM62shows very good performance for BLAST searching. However, other BLOSUMmatrices, like the PAM matrices, may be useful in other applications.For example, BLOSUM45 is particularly strong in profile searching.

C. FASTA

The FASTA suite of programs permits the evaluation of DNA and proteinsimilarity based on local sequence alignment. The FASTA search algorithmutilizes Smith/Waterman- and Needleman/Wunsch-based optimizationmethods. These algorithms consider all of the alignment possibilitiesbetween the query sequence and the library in the highest-scoringsequence regions. The search algorithm proceeds in four basic steps:

-   1. The identities or pairs of identities between the two DNA or    protein sequences are determined. The ktup parameter, as set by the    user, is operative and determines how many consecutive sequence    identities are required to indicate a match.-   2. The regions identified in step I are re-scored using a PAM or    BLOSUM matrix. This allows conservative replacements and runs of    identities shorter than that specified by ktup to contribute to the    similarity score.-   3. The region with the single best scoring initial region is used to    characterize pairwise similarity and these scores are used to rank    the library sequences.-   4. The highest scoring library sequences are aligned using the    Smith-Waterman algorithm. This final comparison takes into account    the possible alignments of the query and library sequence in the    highest scoring region.

Further detailed description of the FASTA software and its use can befound in the following reference: Pearson and Lipman, Proc. Natl. Acad.Sci., 85: 2444 [1988].

D. Pfam

Despite the large number of different protein sequences determinedthrough genomics-based approaches, relatively few structural andfunctional domains are known. Pfam is a computational method thatutilizes a collection of multiple alignments and profile hidden Markovmodels of protein domain families to classify existing and newly foundprotein sequences into structural families. Detailed descriptions of thePfam software and its uses can be found in the following references:Sonhammer et al., Proteins: Structure, Function and Genetics, 28:405[1997]; Sonhammer et al., Nucleic Acids Res., 26:320 [1998]; Bateman etal., Nucleic Acids Res., 27: 260 [1999].

Pfam 3.1, the latest version, includes 54% of proteins in SWISS_PROT[For a recent reference see: Barker W. C., Garavelli J. S., Hou Z.,Huang H., Ledley R. S., McGarvey P. B., Mewes H.-W., Orcutt B. C.,Pfeiffer F., Tsugita A., Vinayaka C. R., Xiao C., Yeh L. S., Wu C.;Nucleic Acids Res. 29:29-32(2001)] and SP-TrEMBL-5 (A supplement toSWISS_PROT) as a match to the database and includes expectation valuesfor matches. Pfam consists of parts A and B. Pfam-A contains a hiddenMarkov model and includes curated families. Pfam-B uses the Domainerprogram to cluster sequence segments not included in Pfam-A. Domaineruses pairwise homology data from BLASTP to construct aligned families.

Alternative protein family databases that may be used include PRINTS andBLOCKS. Both are based on a set of ungapped blocks of aligned residues.However, these programs typically contain short conserved regionswhereas Pfam represents a library of complete domains that facilitatesautomated annotation. Comparisons of Pfam profiles may also be performedusing genomic and EST (An abbreviation for expressed sequence tag whichare defined as single pass sequencing of cDNAs usually 5′) data with theprograms, Genewise and ESTwise, respectively. Both of these programsallow for introns and frame shifting errors.

E. BLOCKS

The determination of sequence relationships between unknown sequencesand those that have been categorized can be problematic becausebackground noise increases with the number of sequences, especially at alow level of similarity detection. One recent approach to this problemhas been tested that efficiently detects and confirms weak or distantrelationships among protein sequences based on a database of blocks. TheBLOCKS database provides multiple alignments of sequences and containsblocks or protein motifs found in known families of proteins.

Other programs such as PRINTS [The PRINTS database of proteinfingerprints prepared under the supervision of Terri Attwood at theUniversity of Manchester.Reference: Attwood T. K., Croning M. D. R.,Flower D. R., Lewis A. P., Mabey J. E., Scordis P., Selley J. N. andWright W.; Nucleic Acids Res. 28:225-227(2000)] and Prodom also providealignments, however, the BLOCKS database differs in the manner in whichthe database was constructed. Construction of the BLOCKS database [S.Henikoff & J. G. Henikoff, “Protein family classification based onsearching a database of blocks”, Genomics 19:97-107 (1994). S. Henikoff,J. G. Henikoff, W. J. Alford & S. Pietrokovski, “Automated constructionand graphical presentation of protein blocks from unaligned sequences”,Gene-COMBIS, Gene 163 (1995) GC 17-26. S. Pietrokovski, “SearchingDatabases of Conserved Seqeuence Regions by Aligning ProteinMultiple-Alignments”, NAR 24:3836-3845 (1996)] proceeds as follows: onestarts with a group of sequences that presumably have one or motifs incommon, such as those from the PROSITE database [Hofmann K., Bucher P.,Falquet L. and Bairoch A.; Nucleic Acids Res. 27:215-219(1999)]. ThePROTOMAT program [S Henikoff & J G Henikoff, “Automated assembly ofprotein blocks for database searching”, NAR (1991) 19:6565-6572) usingthe MOTIF algorithm (H O Smith, et al, “Finding sequence motifs ingroups of functionally related proteins”, PNAS (1990) 87:826-830] thenuses a motif finding program to scan sequences for similarity lookingfor spaced triplets of amino acids. The located blocks are then enteredinto the MOTOMAT program [The first step (PROTOMAT) finds candidatealignments and the second step (MOTOMAT) extends the alignments, thensorts them in such a way that a best set is chosen] for block assembly.Weights are computed for all sequences. Following construction of aBLOCKS database one can use BLIMPS [this is a tool to search BLOCKS. Ibelieve the reference is Henikoff S, Henikoff J G, Alford W J andPietroskouski S (1995) Gene 163 GC17-26] to performs searches of theBLOCKS database. Detailed descriptions of the construction and use of aBLOCKS database can be found in the following references: Henikoff, S.and Henikoff, J. G., Genomics, 19:97 [1994]; Henikoff, J. G. andHenikoff, S., Meth. Enz., 266:88 [1996].

F. PRINTS

The PRINTS database of protein family fingerprints can be used inaddition to BLOCKS and PROSITE. These databases are considered to besecondary databases because they diagnose the relationship betweensequences that yield function information. Presently, however, it is notrecommended that these databases be used alone. Rather, it is stronglysuggested that these pattern databases be used in conjunction with eachother so that a direct comparison of results can be made to analyzetheir robustness.

Generally, these programs utilize pattern recognition to discover motifswithin protein sequences. However, PRINTS goes one step further, ittakes into account not simply single motifs but several motifssimultaneously that might characterize a family signature. Otherprograms, such as PROSITE, rely on pattern recognition but are limitedby the fact that query sequences must match them exactly. Thus,sequences that vary slightly will be missed. In contrast, the PRINTSdatabase fingerprinting approach is capable of identifying distantrelatives due to its reliance on the fact that sequences do not have tomatch the query exactly. Instead they are scored according to how wellthey fit each motif in the signature. Another advantage of PRINTS isthat it allows the user to search both PRINTS and PROSITEsimultaneously. A detailed description of the use of PRINTS can be foundin the following reference: Attwood et al., Nucleic Acids Res. 25: 212[1997].

III. Nucleic Acid Sequences, Including Related, Variant, Modified andExtended Sequences

This invention encompasses nucleic acids, polypeptides encoded by thenucleic acid sequences, and variants that retain at least one biologicalor other functional activity of the polynucleotide or polypeptide ofinterest. A preferred polynucleotide variant is one having at least 80%,and more preferably 90%, sequence identity to the sequence of interest.A most preferred polynucleotide variant is one having at least 95%sequence identity to the polynucleotide of interest.

In particularly preferred embodiments, the invention encompasses thepolynucleotides comprising a polynucleotide encoded by SEQ IDNOs:1-7554. In particularly preferred embodiments, the nucleic acids areoperably linked to an exogenous promoter (and in most preferredembodiments to a plant promoter) or present in a vector.

It will be appreciated by those skilled in the art that as a result ofthe degeneracy of the genetic code, a multitude of nucleotide sequencesencoding a given polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), some bearing minimal homology tothe nucleotide sequences of any known and naturally occurring gene, maybe produced. Thus, the invention contemplates each and every possiblevariation of nucleotide sequence that could be made by selectingcombinations based on possible codon choices. These combinations aremade in accordance with the standard triplet genetic code as applied tothe nucleotide sequence of the naturally occurring polypeptide, and allsuch variations are to be considered as being specifically disclosed.

Although nucleotide sequences that encode a given polypeptide (forexample, a polypeptide encoded by a nucleic acid of the presentinvention) and its variants are preferably capable of hybridizing to thenucleotide sequence of the naturally occurring polypeptide underappropriately selected conditions of stringency, it may be advantageousto produce nucleotide sequences encoding the polypeptide or itsderivatives possessing a substantially different codon usage. Codons maybe selected to increase the rate at which expression of the peptideoccurs in a particular prokaryotic or eukaryotic host in, accordancewith the frequency with which particular codons are utilized by thehost. Other reasons for substantially altering the nucleotide sequenceencoding a polypeptide and its derivatives without altering the encodedamino acid sequences include the production of RNA transcripts havingmore desirable properties, such as a greater half-life, than transcriptsproduced from the naturally occurring sequence.

The invention also encompasses production of DNA sequences, or portionsthereof, that encode a polynucleotide and its variants, entirely bysynthetic chemistry. After production, the synthetic sequence may beinserted into any of the many available expression vectors and cellsystems using reagents that are well known in the art. Moreover,synthetic chemistry may be used to introduce mutations into a sequenceencoding a polynucleotide of the present invention or any portionthereof.

Also encompassed by the invention are polynucleotide sequences that arecapable of hybridizing to SEQ ID NOs:1-7554 under various conditions ofstringency (for example, conditions ranging from low to highstringency). Hybridization conditions are based on the meltingtemperature T_(m) of the nucleic acid binding complex or probe, astaught in Wahl and Berger, Methods Enzymol., 152:399 [1987] and Kimmel,Methods Enzymol., 152:507 [1987], and may be used at a definedstringency.

Modified nucleic acid sequences encoding a polynucleotide of the presentinvention include deletions, insertions, or substitutions of differentnucleotides resulting in a polynucleotide that encodes the samepolypeptide or a functionally equivalent polynucleotide or polypeptide.The encoded protein may also contain deletions, insertions, orsubstitutions of amino acid residues that produce a silent change andresult in a functionally equivalent polypeptide. Deliberate amino acidsubstitutions may be made on the basis of similarity in polarity,charge, solubility, hydrophobicity, hydrophilicity, and/or theamphipathic nature of the residues as long as the biological activity ofthe polypeptide is retained. For example, negatively charged amino acidsmay include aspartic acid and glutamic acid; positively charged aminoacids may include lysine and arginine; and amino acids with unchargedpolar head groups having similar hydrophilicity values may includeleucine, isoleucine, and valine; glycine and alanine; asparagine andglutamine; serine and threonine; phenylalanine and tyrosine.

Also included within the scope of the present invention are alleles ofthe genes encoding polypeptides. As used herein, an “allele” or “allelicsequence” is an alternative form of the gene that may result from atleast one mutation in the nucleic acid sequence. Alleles may result inmodified mRNAs or polypeptides whose structure or function may or maynot be modified. Any given gene may have none, one, or many allelicforms. Common mutational changes that give rise to alleles are generallyascribed to natural deletions, additions, or substitutions ofnucleotides. Each of these types of changes may occur alone, or incombination with the others, one or more times in a given sequence.

Methods for DNA sequencing that are well known and generally availablein the art may be used to practice any embodiments of the invention. Themethods may employ such enzymes as the Klenow fragment of DNA polymeraseI, SEQUENASE (US Biochemical Corporation; Cleveland, Ohio), TAQpolymerase (U.S. Biochemical Corporation, Cleveland, Ohio), thermostableT7 polymerase (Amersham Pharmacia Biotech; Chicago, Ill.), orcombinations of recombinant polymerases and proofreading exonucleasessuch as the ELONGASE amplification system (Life Technologies, Inc.;Rockville, Md.). Preferably, the process is automated with machines suchas the MICROLAB 2200 (Hamilton Company; Reno, Nev.), PTC200 DNA Enginethermal cycler (MJ Research; Watertown, Mass.) and the ABI 377 DNAsequencer (Perkin Elmer).

The nucleic acid sequences encoding a polynucleotide of the presentinvention may be extended utilizing a partial nucleotide sequence andemploying various methods known in the art to detect upstream sequencessuch as promoters and regulatory elements. For example, one method thatmay be employed, “restriction-site” PCR, uses universal primers toretrieve unknown sequence adjacent to a known locus [Sarkar, PCR MethodsApplic. 2:318 (1993)]. In particular, genomic DNA is first amplified inthe presence of primer to linker sequence and a primer specific to theknown region. The amplified sequences are then subjected to a secondround of PCR with the same linker primer and another specific primerinternal to the first one. Products of each round of PCR are transcribedwith an appropriate RNA polymerase and sequenced using reversetranscriptase.

Inverse PCR may also be used to amplify or extend sequences usingdivergent primers based on a known region [Triglia et al., Nucleic AcidsRes. 16:8186 (1988)]. The primers may be designed using OLIGO 4.06primer analysis software (National Biosciences Inc.; Plymouth, Minn.),or another appropriate program, to be 22-30 nucleotides in length, tohave a GC content of 50% or more, and to anneal to the target sequenceat temperatures about 68-72° C. The method uses several restrictionenzymes to generate a suitable fragment in the known region of a gene.The fragment is then circularized by intramolecular ligation and used asa PCR template.

Another method that may be used is capture PCR which involves PCRamplification of DNA fragments adjacent to a known sequence in human andyeast artificial chromosome DNA [Lagerstrom et al., PCR Methods Applic.1:111 (1991)]. In this method, multiple restriction enzyme digestionsand ligations may also be used to place an engineered double-strandedsequence into an unknown portion of the DNA molecule before performingPCR.

Another method that may be used to retrieve unknown sequences is that ofParker et al., Nucleic Acids Res., 19:3055 [1991]. Additionally, one mayuse PCR, nested primers, and PROMOTERFINDER DNA Walking Kits libraries(Clontech; Palo Alto, Calif.) to walk in genomic DNA. This processavoids the need to screen libraries and is useful in finding intron/exonjunctions.

When screening for full-length cDNAs, it is preferable to use librariesthat have been size-selected to include larger cDNAs. Also,random-primed libraries are preferable, in that they will contain moresequences that contain the 5′ regions of genes. Use of a randomly primedlibrary may be especially preferable for situations in which an oligod(T) library does not yield a full-length cDNA. Genomic libraries may beuseful for extension of sequence into the 5′ and 3′ non-transcribedregulatory regions.

Capillary electrophoresis systems that are commercially available (forexample, from PE Biosystems, Inc.; Foster City, Calif.) may be used toanalyze the size or confirm the nucleotide sequence of sequencing or PCRproducts. In particular, capillary sequencing may employ flowablepolymers for electrophoretic separation, four different fluorescent dyes(one for each nucleotide) that are laser activated, and detection of theemitted wavelengths by a charge coupled device camera. Output/lightintensity may be converted to electrical signal using appropriatesoftware (for example, GENOTYPER and SEQUENCE NAVIGATOR from PEBiosystems; Foster City, Calif.) and the entire process from loading ofsamples to computer analysis and electronic data display may be computercontrolled. Capillary electrophoresis is especially preferable for thesequencing of small pieces of DNA that might be present in limitedamounts in a particular sample.

It is contemplated that the nucleic acids disclosed herein can beutilized as starting nucleic acids for directed evolution. In someembodiments, artificial-evolution is performed by random mutagenesis(for example, by utilizing error-prone PCR to introduce random mutationsinto a given coding sequence). This method requires that the frequencyof mutation be finely tuned. As a general rule, beneficial mutations arerare, while deleterious mutations are common. This is because thecombination of a deleterious mutation and a beneficial mutation oftenresults in an inactive enzyme. The ideal number of base substitutionsfor a targeted gene is usually between 1.5 and 5 [Moore and Arnold, Nat.Biotech., 14, 458-67 (1996); Leung et al., Technique, 1:11-15 (1989);Eckert and Kunkel, PCR Methods Appl., 1:17-24 (1991); Caldwell andJoyce, PCR Methods Appl., 2:28-33 (1992); and Zhao and Arnold, Nuc.Acids. Res., 25:1307-08 (1997)]. After mutagenesis, the resulting clonesare selected for desirable activity. Successive rounds of mutagenesisand selection are often necessary to develop enzymes with desirableproperties. It should be noted that only the useful mutations arecarried over to the next round of mutagenesis.

In other embodiments of the present invention, the polynucleotides ofthe present invention are used in gene shuffling or sexual PCRprocedures (for example, Smith, Nature, 370:324-25 [1994]; U.S. Pat.Nos. 5,837,458; 5,830,721; 5,811,238; and 5,733,731, each of which isherein incorporated by reference). Gene shuffling involves randomfragmentation of several mutant DNAs followed by their reassembly by PCRinto full length molecules. Examples of various gene shufflingprocedures include, but are not limited to, assembly following DNasetreatment, the staggered extension process (STEP), and random priming invitro recombination. In the DNase mediated method, DNA segments isolatedfrom a pool of positive mutants are cleaved into random fragments withDNaseI and subjected to multiple rounds of PCR with no added primer. Thelengths of random fragments approach that of the uncleaved segment asthe PCR cycles proceed, resulting in mutations in present in differentclones becoming mixed and accumulating in some of the resultingsequences. Multiple cycles of selection and shuffling have led to thefunctional enhancement of several enzymes [Stemmer, Nature, 370:398-91(1994); Stemmer, Proc. Natl. Acad. Sci. USA, 91, 10747-51 (1994);Crameri et al., Nat. Biotech., 14:315-19 (1996); Zhang et al., Proc.Natl. Acad. Sci. USA, 94:4504-09 (1997); and Crameri et al., Nat.Biotech., 15:436-38 (1997)].

IV. Vectors, Engineering, and Expression of Sequences

In another embodiment of the invention, the polynucleotide sequences ofthe present invention and fragments and portions thereof, may be used inrecombinant DNA molecules to direct expression of an mRNA or polypeptidein appropriate host cells. Due to the inherent degeneracy of the geneticcode, other DNA sequences that encode substantially the same or afunctionally equivalent mRNA or amino acid sequence may be produced andthese sequences may be used to clone and express polypeptides (forexample, a polypeptide encoded by a nucleic acid of the presentinvention).

As will be understood by those of skill in the art, it may beadvantageous to produce nucleotide sequences possessing non-naturallyoccurring codons. For example, codons preferred by a particularprokaryotic or eukaryotic host can be selected to increase the rate ofprotein expression or to produce a recombinant RNA transcript havingdesirable properties, such as a half-life that is longer than that of atranscript generated from the naturally occurring sequence.

The nucleotide sequences of the present invention can be engineeredusing methods generally known in the art in order to alter thepolypeptide sequences for a variety of reasons, including but notlimited to, alterations that modify the cloning, processing, and/orexpression of the gene product. DNA shuffling by random fragmentationand PCR reassembly of gene fragments and synthetic oligonucleotides maybe used to engineer the nucleotide sequences. For example, site-directedmutagenesis may be used to insert new restriction sites, alterglycosylation patterns, change codon preference, produce splicevariants, or introduce mutations, and so forth.

In another embodiment of the invention, natural, modified, orrecombinant nucleic acid sequences encoding a polypeptide may be ligatedto a heterologous sequence to encode a fusion protein. For example, toscreen peptide libraries for inhibitors of the polypeptides activity(for example, enzymatic activity), it may be useful to encode a chimericprotein that can be recognized by a commercially available antibody. Afusion protein may also be engineered to contain a cleavage site locatedbetween the polypeptide encoding sequence and the heterologous proteinsequence, so that the polypeptide of interest may be cleaved andpurified away from the heterologous moiety.

In another embodiment, sequences encoding a polypeptide (for example, apolypeptide encoded by a nucleic acid of the present invention) may besynthesized, in whole or in part, using chemical methods well known inthe art [See for example, Caruthers et al., Nucl. Acids Res. Symp. Ser.215 (1980); Horn et al., Nucl. Acids Res. Symp. Ser. 225 (1980)].Alternatively, the protein itself may be produced using chemical methodsto synthesize the amino acid sequence of the polypeptide of interest(for example, a polypeptide encoded by a nucleic acid of the presentinvention), or a portion thereof. For example, peptide synthesis can beperformed using various solid-phase techniques [Roberge et al., Science269:202 (1995)] and automated synthesis may be achieved, for example,using the ABI 431A peptide synthesizer (PE Corporation, Norwalk, Conn.).

The newly synthesized peptide may be substantially purified bypreparative high performance liquid chromatography [See for example,Creighton, T. (1983) Proteins, Structures and Molecular Principles, WHFreeman and Co., New York, N.Y.]. The composition of the syntheticpeptides may be confirmed by amino acid analysis or sequencing (forexample, the Edman degradation procedure; or Creighton, supra).Additionally, the amino acid sequence of the polypeptide of interest orany part thereof, may be changed during direct synthesis and/or combinedusing chemical methods with sequences from other proteins, or any partthereof, to produce a variant polypeptide.

In order to express a biologically active polypeptide (for example, apolypeptide encoded by a nucleic acid of the present invention) or RNA,the nucleotide sequences encoding the polypeptide or functionalequivalents, may be inserted into appropriate expression vector, thatis, a vector that contains the necessary elements for the transcriptionand translation of the inserted coding sequence.

Methods that are well known to those skilled in the art may be used toconstruct expression vectors containing sequences encoding polypeptides(for example, a polypeptide encoded by a nucleic acid of the presentinvention) and appropriate transcriptional and translational controlelements. These methods include in vitro recombinant DNA techniques,synthetic techniques, and in vivo genetic recombination. Such techniquesare described in Sambrook. et al. (1989) Molecular Cloning, A LaboratoryManual, Cold Spring Harbor Press, Plainview, N.Y., and Ausubel, F. M. etal. (1989) Current Protocols in Molecular Biology, John Wiley & Sons,New York, N.Y.

A variety of expression vector/host systems may be utilized to containand express sequences encoding a polypeptide of interest. These include,but are not limited to, microorganisms such as bacteria transformed withrecombinant bacteriophage, plasmid, or cosmid DNA expression vectors;yeast transformed with yeast expression vectors; insect cell systemsinfected with virus expression vectors (for example, baculovirus); plantcell systems transformed with virus expression vectors (for example,cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV; brome mosaicvirus) or with bacterial expression vectors (for example, Ti or pBR322plasmids); or animal cell systems.

The “control elements” or “regulatory sequences” are thosenon-translated regions of the vector (for example, enhancers, promoters,5′ and 3′ untranslated regions) that interact with host cellularproteins to carry out transcription and translation. Such elements mayvary in their strength and specificity. Depending on the vector systemand host utilized, any number of suitable transcription and translationelements, including constitutive and inducible promoters, may be used.For example, when cloning in bacterial systems, inducible promoters suchas the hybrid lacZ promoter of the BLUESCRIPT phagemid (Stratagene;LaJolla, Calif.) or PSPORT1 plasmid (Life Technologies, Inc.; Rockville,Md.) and the like may be used. The baculovirus polyhedrin promoter maybe used in insect cells. Promoters or enhancers derived from the genomesof plant cells (for example, heat shock, RUBISCO; and storage proteingenes) or from plant viruses (for example, viral promoters or leadersequences) may be cloned into the vector. In mammalian cell systems,promoters from mammalian genes or from mammalian viruses are preferable.If it is necessary to generate a cell line that contains multiple copiesof the sequence encoding a polypeptide, vectors based on SV40 or EBV maybe used with an appropriate selectable marker.

In bacterial systems, a number of expression vectors may be selecteddepending upon the use intended for the polypeptide of interest. Forexample, when large quantities of the polypeptide are needed for theinduction of antibodies, vectors that direct high level expression offusion proteins that are readily purified may be used. Such vectorsinclude, but are not limited to, the multifunctional E. coli cloning andexpression vectors such as BLUESCRIPT phagemid (Stratagene; La Jolla,Calif.), in which the sequence encoding the polypeptide of interest maybe ligated into the vector in frame with sequences for theamino-terminal Met and the subsequent 7 residues of beta-galactosidaseso that a hybrid protein is produced; pIN vectors (Van Heeke andSchuster, J. Biol. Chem. 264:5503 [1989]; and the like. pGEMX vectors(Promega Corporation; Madison, Wis.) may also be used to express foreignpolypeptides as fusion proteins with glutathione S-transferase (GST). Ingeneral, such fusion proteins are soluble and can easily be purifiedfrom lysed cells by adsorption to glutathione-agarose beads followed byelution in the presence of free glutathione. Proteins made in suchsystems may be designed to include heparin, thrombin, or factor XAprotease cleavage sites so that the cloned polypeptide of interest canbe released from the GST moiety at will.

In the yeast Saccharomyces cerevisiae, a number of vectors containingconstitutive or inducible promoters such as alpha factor, alcoholoxidase, and PGH may be used. For reviews, See for example, Ausubel etal. (supra) and Grant et al., Methods Enzymol. 153:516 [1987].

In cases where plant expression vectors are used, the expression ofsequences encoding polypeptides may be driven by any of a number ofpromoters. In a preferred embodiment, plant vectors are created using arecombinant plant virus containing a recombinant plant viral nucleicacid, as described in PCT publication WO 96/40867. Subsequently, therecombinant plant viral nucleic acid that contains one or morenon-native nucleic acid sequences may be transcribed or expressed in theinfected tissues of the plant host and the product of the codingsequences may be recovered from the plant, as described in WO 99/36516.

An important feature of this embodiment is the use of recombinant plantviral nucleic acids that contain one or more non-native subgenomicpromoters capable of transcribing or expressing adjacent nucleic acidsequences in the plant host and that result in replication and localand/or systemic spread in a compatible plant host. The recombinant plantviral nucleic acids have substantial sequence homology to plant viralnucleotide sequences and may be derived from an RNA, DNA, cDNA or achemically synthesized RNA or DNA. A partial listing of suitable virusesis described below.

The first step in producing recombinant plant viral nucleic acidsaccording to this particular embodiment is to modify the nucleotidesequences of the plant viral nucleotide sequence by known conventionaltechniques such that one or more non-native subgenomic promoters areinserted into the plant viral nucleic acid without destroying thebiological, function of the plant viral nucleic acid. The native coatprotein coding sequence may be deleted in some embodiments, placed underthe control of a non-native subgenomic promoter in other embodiments, orretained in a further embodiment. If it is deleted or otherwiseinactivated, a non-native coat protein gene is inserted under control ofone of the non-native subgenomic promoters, or optionally under controlof the native coat protein gene subgenomic promoter. The non-native coatprotein is capable of encapsidating the recombinant plant viral nucleicacid to produce a recombinant plant virus. Thus, the recombinant plantviral nucleic acid contains a coat protein coding sequence, that may benative or a nonnative coat protein coding sequence, under control of oneof the native or non-native subgenomic promoters. The coat protein isinvolved in the systemic infection of the plant host.

Some of the viruses that meet this requirement include viruses from thetobamovirus group such as Tobacco Mosaic virus (TMV), Ribgrass MosaicVirus (RGM), Cowpea Mosaic virus (CMV), Alfalfa Mosaic virus (AMV),Cucumber Green Mottle Mosaic virus Watermelon Strain (CGMMV-W) and OatMosaic virus (OMV) and viruses from the brome mosaic virus group such asBrome Mosaic virus (BMV), Broad Bean Mottle virus and Cowpea Chloroticmottle virus. Additional suitable viruses include Rice Necrosis virus(RNV), and geminiviruses such as Tomato Golden Mosaic virus (TGMV),Cassava Latent virus (CLV) and Maize Streak virus (MSV). However, theinvention should not be construed as limited to using these particularviruses, but rather the method of the present invention is contemplatedto include all plant viruses at a minimum.

Other embodiments of plant vectors used for the expression of sequencesencoding polypeptides include, for example, viral promoters such as the35S and 19S promoters of CaMV used alone or in combination with theomega leader sequence from TMV [Takamatsu, EMBO J. 6:307 (1987)].Alternatively, plant promoters such as the small subunit of RUBISCO orheat shock promoters may be used [Coruzzi et al., EMBO J. 3:1671 (1984);Broglie et al., Science 224:838 (1984); and Winter et al., ResultsProbl. Cell Differ. 17:85 (1991)]. These constructs can be introducedinto plant cells by direct DNA transformation or pathogen-mediatedtransfection. Such techniques are described in a number of generallyavailable reviews (See for example, Hobbs, S. or Murry, L. E. in McGrawHill Yearbook of Science and Technology (1992) McGraw Hill, New York,N.Y.; pp. 191-196.

The present invention further provides transgenic plants comprising thepolynucleotides of the present invention. In some preferred embodiments,Agrobacterium mediated transfection is utilized to create transgenicplants. Since most dicotyledonous plant are natural hosts forAgrobacterium, almost every dicotyledonous plant may be transformed byAgrobacterium in vitro. Although monocotyledonous plants, and inparticular, cereals and grasses, are not natural hosts to Agrobacterium,work to transform them using Agrobacterium has also been carried out[Hooykas-Van Slogteren et al. (1984) Nature 311:763-764]. Plant generathat may be transformed by Agrobacterium include Arabidopsis,Chrysanthemum, Dianthus, Gerbera, Euphorbia, Pelaronium, Ipomoea,Passiflora, Cyclamen, Malus, Prunus, Rosa, Rubus, Populus, Santalum,Allium, Lilium, Narcissus, Ananas, Arachis, Phaseolus and Pisum.

For transformation with Agrobacterium, disarmed Agrobacterium cells aretransformed with recombinant Ti plasmids of Agrobacterium tumefaciens orRi plasmids of Agrobacterium rhizogenes (such as those described in U.S.Pat. No. 4,940,838, the entire contents of which are herein incorporatedby reference). The nucleic acid sequence of interest is then stablyintegrated into the plant genome by infection with the transformedAgrobacterium strain. For example, heterologous nucleic acid sequenceshave been introduced into plant tissues using the natural DNA transfersystem of Agrobacterium tumefaciens and Agrobacterium rhizogenesbacteria [for review, see Klee et al. (1987) Ann. Rev. Plant Phys.38:467-486].

There are three common methods to transform plant cells withAgrobacterium. The first method is co-cultivation of Agrobacterium withcultured isolated protoplasts. This method requires an establishedculture system that allows culturing protoplasts and plant regenerationfrom cultured protoplasts. The second method is transformation of cellsor tissues with Agrobacterium. This method requires (a) that the plantcells or tissues can be transformed by Agrobacterium and (b) that thetransformed cells or tissues can be induced to regenerate into wholeplants. The third method is transformation of seeds, apices or meristemswith Agrobacterium. This method requires micropropagation.

The efficiency of transformation by Agrobacterium may be enhanced byusing a number of methods known in the art. For example, the inclusionof a natural wound response molecule such as acetosyringone (AS) to theAgrobacterium culture has been shown to enhance transformationefficiency with Agrobacterium tumefaciens [Shahla et al., (1987) PlantMolec. Biol. 8:291-298]. Alternatively, transformation efficiency may beenhanced by wounding the target tissue to be transformed. Wounding ofplant tissue may be achieved, for example, by punching, maceration,bombardment with microprojectiles, etc. [See e.g., Bidney et al., (1992)Plant Molec. Biol. 18:301-313].

In still further embodiments, the plant cells are transfected withvectors via particle bombardment (i.e., with a gene gun). Particlemediated gene transfer methods are known in the art, are commerciallyavailable, and include, but are not limited to, the gas driven genedelivery instrument descried in McCabe, U.S. Pat. No. 5,584,807, theentire contents of which are herein incorporated by reference. Thismethod involves coating the nucleic acid sequence of interest onto heavymetal particles, and accelerating the coated particles under thepressure of compressed gas for delivery to the target tissue.

Other particle bombardment methods are also available for theintroduction of heterologous nucleic acid sequences into plant cells.Generally, these methods involve depositing the nucleic acid sequence ofinterest upon the surface of small, dense particles of a material suchas gold, platinum, or tungsten. The coated particles are themselves thencoated onto either a rigid surface, such as a metal plate, or onto acarrier sheet made of a fragile material such as mylar. The coated sheetis then accelerated toward the target biological tissue. The use of theflat sheet generates a uniform spread of accelerated particles thatmaximizes the number of cells receiving particles under uniformconditions, resulting in the introduction of the nucleic acid sampleinto the target tissue.

An insect system may also be used to express polypeptides (for example,a polypeptide encoded by a nucleic acid of the present invention). Forexample, in one such system, Autographa californica nuclear polyhedrosisvirus (AcNPV) is used as a vector to express foreign genes in Spodopterafrugiperda cells or in Trichoplusia larvae. The sequences encoding apolypeptide of interest may be cloned into a non-essential region of thevirus, such as the polyhedrin gene, and placed under control of thepolyhedrin promoter. Successful insertion of the nucleic acid sequenceencoding the polypeptide of interest will render the polyhedrin geneinactive and produce recombinant virus lacking coat protein. Therecombinant viruses may then be used to infect, for example, S.frugiperda cells or Trichoplusia larvae in which the polypeptide may beexpressed [Engelhard et al., Proc. Nat. Acad. Sci. 91:3224 (1994)].

In mammalian host cells, a number of viral-based expression systems maybe utilized. In cases where an adenovirus is used as an expressionvector, sequences encoding polypeptides may be ligated into anadenovirus transcription/translation complex consisting of the latepromoter and tripartite leader sequence. Insertion in a non-essential E1or E3 region of the viral genome may be used to obtain a viable virusthat is capable of expressing the polypeptide in infected host cells[Logan and Shenk, Proc. Natl. Acad. Sci., 81:3655 (1984)]. In addition,transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer,may be used to increase expression in mammalian host cells.

Specific initiation signals may also be used to achieve more efficienttranslation of sequences encoding the polypeptide of interest. Suchsignals include the ATG initiation codon and adjacent sequences. Incases where sequences encoding the polypeptide of interest, itsinitiation codon, and upstream sequences are inserted into theappropriate expression vector, no additional transcriptional ortranslational control signals may be needed. However, in cases whereonly coding sequence, or a portion thereof, is inserted, exogenoustranslational control signals including the ATG initiation codon shouldbe provided. Furthermore, the initiation codon should be in the correctreading frame to ensure translation of the entire insert. Exogenoustranslational elements and initiation codons may be of various origins,both natural and synthetic. The efficiency of expression may be enhancedby the inclusion of enhancers that are appropriate for the particularcell system that is used, such as those described in the literature[Scharf et al., Results Probl. Cell Differ., 20:125 (1994)].

In addition, a host cell strain may be chosen for its ability tomodulate the expression of the inserted sequences or to process theexpressed protein in the desired fashion. Such modifications of thepolypeptide include, but are not limited to, acetylation, carboxylation,glycosylation, phosphorylation, lipidation, and acylation.Post-translational processing that cleaves a “prepro” form of theprotein may also be used to facilitate correct insertion, folding and/orfunction. Different host cells such as CHO, HeLa, MDCK, HEK293, andWI38, that have specific cellular machinery and characteristicmechanisms for such post-translational activities, may be chosen toensure the correct modification and processing of the foreign protein.

For long-term, high-yield production of recombinant proteins, stableexpression is preferred. For example, cell lines that stably express thepolypeptide of interest (for example, a polypeptide encoded by a nucleicacid of the present invention) may be transformed using expressionvectors that may contain viral origins of replication and/or endogenousexpression elements and a selectable marker gene on the same or on aseparate vector. Following the introduction of the vector, cells may beallowed to grow for 1-2 days in an enriched media before they areswitched to selective media. The purpose of the selectable marker is toconfer resistance to selection, and its presence allows growth andrecovery of cells that successfully express the introduced sequences.Resistant clones of stably transformed cells may be proliferated usingtissue culture techniques appropriate to the cell type.

Any number of selection systems may be used to recover transformed celllines. These include, but are not limited to, the herpes simplex virusthymidine kinase [Wigler et al., Cell 11:223 (1977)] and adeninephosphoribosyltransferase [Lowy et al., Cell 22:817 (1980)] genes thatcan be employed in tk⁻ or aprt⁻ cells, respectively. Also,antimetabolite, antibiotic, or herbicide resistance can be used as thebasis for selection; for example, dhfr, which confers resistance tomethotrexate [Wigler et al., Proc. Natl. Acad. Sci., 77:3567 (1980)];npt, which confers resistance to the aminoglycosides neomycin and G-418[Colbere-Garapin et al., J. Mol. Biol., 150:1 (1981)]; and als or pat,which confer resistance to chlorsulfuron and phosphinotricinacetyltransferase, respectively (Murry, supra). Additional selectablegenes have been described, for example, trpB, which allows cells toutilize indole in place of tryptophan, or hisD, which allows cells toutilize histinol in place of histidine [Hartman and Mulligan, Proc.Natl. Acad. Sci., 85:8047 (1988)]. Recently, the use of visible markershas gained popularity with such markers as anthocyanins, α-glucuronidaseand its substrate GUS, and luciferase and its substrate luciferin, beingwidely used not only to identify transformants, but also to quantify theamount of transient or stable protein expression attributable to aspecific vector system [Rhodes et al., Methods Mol. Biol., 55:121(1995)].

Although the presence/absence of marker gene expression suggests thatthe gene of interest is also present, its presence and expression mayneed to be confirmed. For example, if the sequence encoding apolypeptide is inserted within a marker gene sequence, recombinant cellscontaining sequences encoding the polypeptide can be identified by theabsence of marker gene function. Alternatively, a marker gene can beplaced in tandem with a sequence encoding the polypeptide under thecontrol of a single promoter. Expression of the marker gene in responseto induction or selection usually indicates expression of the tandemgene as well.

Alternatively, host cells that contain the nucleic acid sequenceencoding the polypeptide of interest (for example, a polypeptide encodedby a nucleic acid of the present invention) and express the polypeptidemay be identified by a variety of procedures known to those of skill inthe art. These procedures include, but are not limited to, DNA-DNA orDNA-RNA hybridizations and protein bioassay or immunoassay techniquesthat include membrane, solution, or chip based technologies for thedetection and/or quantification of nucleic acid or protein.

The presence of polynucleotide sequences encoding a polypeptide ofinterest (for example, a polypeptide encoded by a nucleic acid of thepresent invention) can be detected by DNA-DNA or DNA-RNA hybridizationor amplification using probes or portions or fragments ofpolynucleotides encoding the polypeptide. Nucleic acid amplificationbased assays involve the use of oligonucleotides or oligomers based onthe sequences encoding the polypeptide to detect transformantscontaining DNA or RNA encoding the polypeptide. As used herein“oligonucleotides” or “oligomers” refer to a nucleic acid sequence of atleast about 10 nucleotides and as many as about 60 nucleotides,preferably about 15 to 30 nucleotides, and more preferably about 20-25nucleotides, that can be used as a probe or amplimer.

A variety of protocols for detecting and measuring the expression of apolypeptide (for example, a polypeptide encoded by a nucleic acid of thepresent invention), using either polyclonal or monoclonal antibodiesspecific for the protein are known in the art. Examples includeenzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), andfluorescence activated cell sorting (FACS). A two-site, monoclonal-basedimmunoassay utilizing monoclonal antibodies reactive to twonon-interfering epitopes on the polypeptide is preferred, but acompetitive binding assay may be employed. These and other assays aredescribed, among other places, in Hampton et al., 1990; SerologicalMethods, a Laboratory Manual, APS Press, St Paul, Minn. and Maddox etal., J. Exp. Med., 158:1211 (1983).

A wide variety of labels and conjugation techniques are known by thoseskilled in the art and may be used in various nucleic acid and aminoacid assays. Means for producing labeled hybridization or PCR probes fordetecting sequences related to polynucleotides encoding a polypeptide ofinterest include oligonucleotide labeling, nick translation,end-labeling or PCR amplification using a labeled nucleotide.Alternatively, the sequences encoding the polypeptide, or any portionsthereof may be cloned into a vector for the production of an mRNA probe.Such vectors are known in the art, are commercially available, and maybe used to synthesize RNA probes in vitro by addition of an appropriateRNA polymerase such as T7, T3, or SP6 and labeled nucleotides. Theseprocedures may be conducted using a variety of commercially availablekits from Pharmacia & Upjohn (Kalamazoo, Mich.), Promega Corporation(Madison, Wis.) and U.S. Biochemical Corp. (Cleveland, Ohio). Suitablereporter molecules or labels, that may be used, include radionuclides,enzymes, fluorescent, chemiluminescent, or chromogenic agents as well assubstrates, cofactors, inhibitors, magnetic particles, and the like.

Host cells transformed with nucleotide sequences encoding a polypeptideof interest may be cultured under conditions suitable for the expressionand recovery of the protein from cell culture. The protein produced by arecombinant cell may be secreted or contained intracellularly dependingon the sequence and/or the vector used. As will be understood by thoseof skill in the art, expression vectors containing polynucleotides thatencode the polypeptide of interest (for example, a polypeptide encodedby a nucleic acid of the present invention) may be designed to containsignal sequences that direct secretion of the polypeptide through aprokaryotic or eukaryotic cell membrane. Other recombinant constructionsmay be used to join sequences encoding the polypeptide to nucleotidesequence encoding a polypeptide domain that will facilitate purificationof soluble proteins. Such purification facilitating domains include, butare not limited to, metal chelating peptides such ashistidine-tryptophan modules that allow purification on immobilizedmetals, protein A domains that allow purification on immobilizedimmunoglobulin, and the domain utilized in the FLAGS extension/affinitypurification system (Immunex Corp., Seattle, Wash.). The inclusion ofcleavable linker sequences such as those specific for Factor XA orenterokinase (available from Invitrogen; San Diego, Calif.) between thepurification domain and the polypeptide of interest may be used tofacilitate purification. One such expression vector provides forexpression of a fusion protein containing the polypeptide of interestand a nucleic acid encoding 6 histidine residues preceding a thioredoxinor an enterokinase cleavage site. The histidine residues facilitatepurification on IMIAC (immobilized metal ion affinity chromatography) asdescribed in Porath et al., Prot. Exp. Purif., 3:263 [1992] while theenterokinase cleavage site provides a means for purifying thepolypeptide from the fusion protein. A discussion of vectors thatcontain fusion proteins is provided in Kroll et al., DNA Cell Biol.,12:441 (1993).

In addition to recombinant production, fragments of the polypeptide ofinterest may be produced by direct peptide synthesis using solid-phasetechniques [Merrifield, J. Am. Chem. Soc., 85:2149 (1963)]. Proteinsynthesis may be performed using manual techniques or by automation.Automated synthesis may be achieved, for example, using the AppliedBiosystems 431A peptide synthesizer (Perkin Elmer). Various fragments ofthe polypeptide may be chemically synthesized separately and combinedusing chemical methods to produce the full-length molecule.

V. Alteration of Gene Expression

It is contemplated that the polynucleotides of the present invention(for example, SEQ ID NOs:1-7554) may be utilized to either increase ordecrease the level of corresponding mRNA and/or protein in transfectedcells as compared to the levels in wild-type cells. Accordingly, in someembodiments, expression in plants by the methods described above leadsto the overexpression of the polypeptide of interest in transgenicplants, plant tissues, or plant cells. The present invention is notlimited to any particular mechanism. Indeed, an understanding of amechanism is not required to practice the present invention. However, itis contemplated that overexpression of the polynucleotides of thepresent invention will alter the expression of the gene comprising thenucleic acid sequence of the present invention. In some embodiments,more than one of SEQ ID NOs:1-7554 are expressed in a given plant. Thesequences may be contained in the same vector or in different vectors.The sequences can influence the same metabolic trait (e.g., fatty acidmetabolism or one of the other traits discussed in more detail below) ormultiple metabolic traits (e.g., fatty acid and carbohydratemetabolism).

In other embodiments of the present invention, the polynucleotides areutilized to decrease the level of the protein or mRNA of interest intransgenic plants, plant tissues, or plant cells as compared towild-type plants, plant tissues, or plant cells. One method of reducingprotein expression utilizes expression of antisense transcripts (forexample, U.S. Pat. Nos. 6,031,154; 5,453,566; 5,451,514; 5,859,342; and4,801,340, each of which is incorporated herein by reference). AntisenseRNA has been used to inhibit plant target genes in a tissue-specificmanner [for example, Van der Krol et al., Biotechniques 6:958-976(1988)]. Antisense inhibition has been shown using the entire cDNAsequence as well as a partial cDNA sequence [for example, Sheehy et al.,Proc. Natl. Acad. Sci. USA 85:8805-8809 (1988); Cannon et al., PlantMol. Biol. 15:39-47 (1990)]. There is also evidence that 3′ non-codingsequence fragment and 5′ coding sequence fragments, containing as few as41 base-pairs of a 1.87 kb cDNA, can play important roles in antisenseinhibition [Ch'ng et al., Proc. Natl. Acad. Sci. USA 86:10006-10010(1989)].

Accordingly, in some embodiments, the nucleic acids of the presentinvention (for example, SEQ ID NOs:1-7554, and fragments and variantsthereof) are oriented in a vector and expressed so as to produceantisense transcripts. To accomplish this, a nucleic acid segment fromthe desired gene is cloned and operably linked to a promoter such thatthe antisense strand of RNA will be transcribed. The expression cassetteis then transformed into plants and the antisense strand of RNA isproduced. The nucleic acid segment to be introduced generally will besubstantially identical to at least a portion of the endogenous gene orgenes to be repressed. The sequence, however, need not be perfectlyidentical to inhibit expression. The vectors of the present inventioncan be designed such that the inhibitory effect applies to otherproteins within a family of genes exhibiting homology or substantialhomology to the target gene.

Furthermore, for antisense suppression, the introduced sequence alsoneed not be full length relative to either the primary transcriptionproduct or fully processed mRNA. Generally, higher homology can be usedto compensate for the use of a shorter sequence. Furthermore, theintroduced sequence need not have the same intron or exon pattern, andhomology of non-coding segments may be equally effective. Normally, asequence of between about 30 or 40 nucleotides and up to about the fulllength of the coding region should be used, although a sequence of atleast about 100 nucleotides is preferred, a sequence of at least about200 nucleotides is more preferred, and a sequence of at least about 500nucleotides is especially preferred.

Catalytic RNA molecules or ribozymes can also be used to inhibitexpression of the target gene or genes. It is possible to designribozymes that specifically pair with virtually any target RNA andcleave the phosphodiester backbone at a specific location, therebyfunctionally inactivating the target RNA. In carrying out this cleavage,the ribozyme is not itself changed, and is thus capable of recycling andcleaving other molecules, making it a true enzyme. The inclusion ofribozyme sequences within antisense RNAs confers RNA-cleaving activityupon them, thereby increasing the activity of the constructs.

A number of classes of ribozymes have been identified. One class ofribozymes is derived from a number of small circular RNAs that arecapable of self-cleavage and replication in plants. The RNAs replicateeither alone (viroid RNAs) or with a helper virus (satellite RNAs).Examples include RNAs from avocado sunblotch viroid and the satelliteRNAs from tobacco ringspot virus, lucerne transient streak virus, velvettobacco mottle virus, Solanum nodiflorum mottle virus and subterraneanclover mottle virus. The design and use of target RNA-specific ribozymesis described in Haseloff, et al., Nature 334:585-591 (1988).

Another method of reducing protein expression utilizes the phenomenon ofcosuppression or gene silencing (for example, U.S. Pat. Nos. 6,063,947;5,686,649; and 5,283,184; each of which is incorporated herein byreference). The phenomenon of cosuppression has also been used toinhibit plant target genes in a tissue-specific manner. Cosuppression ofan endogenous gene using a full-length cDNA sequence as well as apartial cDNA sequence (730 bp of a 1770 bp cDNA) are known [for example,Napoli et al., Plant Cell 2:279-289 [1990]; van der Krol et al., PlantCell 2:291-299 (1990); Smith et al., Mol. Gen. Genetics 224:477-481(1990]). Accordingly, in some embodiments the nucleic acids (forexample, SEQ ID NOs: 1-7554, and fragments and variants thereof) fromone species of plant are expressed in another species of plant to effectcosuppression of a homologous gene. Generally, where inhibition ofexpression is desired, some transcription of the introduced sequenceoccurs. The effect may occur where the introduced sequence contains nocoding sequence per se, but only intron or untranslated sequenceshomologous to sequences present in the primary transcript of theendogenous sequence. The introduced sequence generally will besubstantially identical to the endogenous sequence intended to berepressed. This minimal identity will typically be greater than about65%, but a higher identity might exert a more effective repression ofexpression of the endogenous sequences. Substantially greater identityof more than about 80% is preferred, though about 95% to absoluteidentity would be most preferred. As with antisense regulation, theeffect should apply to any other proteins within a similar family ofgenes exhibiting homology or substantial homology.

For cosuppression, the introduced sequence in the expression cassette,needing less than absolute identity, also need not be full length,relative to either the primary transcription product or fully processedmRNA. This may be preferred to avoid concurrent production of someplants that are overexpressers. A higher identity in a shorter than fulllength sequence compensates for a longer, less identical sequence.Furthermore, the introduced sequence need not have the same intron orexon pattern, and identity of non-coding segments will be equallyeffective. Normally, a sequence of the size ranges noted above forantisense regulation is used.

VI. Expression of Sequences Producing Altered Metabolic Characteristics

The present invention provides nucleic sequences involved in providingaltered metabolic characteristics in plants. Plants transformed withviral vectors comprising the nucleic acid sequences of the presentinvention were screened for an altered metabolic characteristic. Theresults are presented in FIG. 6. Accordingly, in some embodiments, thepresent invention provides nucleic acid sequences that produce analtered metabolic characteristic when expressed in a plant (SEQ IDNOs:1-1165, 3703-4153 and 7389-7554, FIG. 1). The present invention isnot limited to the particular nucleic acid sequences listed. Indeed, thepresent invention encompasses nucleic acid sequences (includingsequences of the same, shorter, and longer lengths) that hybridize tothe listed nucleic sequences under conditions ranging from low to highstringency and that also cause the altered metabolic characteristic.These sequences are conveniently identified by insertion into GENEWARE™vectors and expression in plants as detailed in the examples.

The present invention is not limited to any particular mechanism.Indeed, an understanding of a mechanism is not required to practice thepresent invention. However, it is contemplated that the expression ofgenes comprising the nucleic acid sequences of the present inventioneffect biochemical pathways that lead to the alteration of metaboliccharacteristics of the present invention. For example, the expression ofgenetic function that effects the acyl acetylglycerol pathway that leadsto the production of the altered esters such as2-steroyl-1-acetylglycerol and 2-eicosanoyl-1-acetylglycerol. Thepresent invention is not limited to alterations of any particularmetabolic pathway. Indeed, the alteration of a variety of metabolicpathways is contemplated, including, but not limited to the pathwaysinvolved in the production of the following compounds: acids, fattyacids, branched fatty acids, alcohols, alkaloids, aleknes, alkynes,amino acids, carbohydrates, esters, glycerol, phenols, sterols,terpenes, isoprenoids, ketones, and quinones.

In some embodiments, the sequences are operably linked to a plantpromoter or provided in a vector as described in more detail above. Thispresent invention also contemplates plants transformed or transfectedwith these sequences, as well as seeds, fruit, leaves, stems and rootsfrom such transfected plants. Furthermore, the sequences can beexpressed in either sense or antisense orientation. In particularlypreferred embodiments, the sequences are at least 30 nucleotides inlength up to the length of the full-length of the corresponding gene. Itis contemplated that sequences of less than full length (for example,greater than about 30 nucleotides) are useful for down regulation ofgene expression via antisense or cosuppression. Suitable sequences areselected by chemically synthesizing the sequences, cloning intoGENEWARE™ expression vectors, expressing in plants, and selecting plantswith an altered metabolic characteristic.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of acids in plants. Examples of acids that can be alteredaccording to the present invention include, but are not limited to,fumaric acid, malic acid, carbamic acid, glyceric acid, citric acid,ketoglutaric acid, quinic acid, shikimic acid, and sugar acids, such as,gluconic acid and galacturonic acid. The alterations in metabolicprofiles are preferably accomplished by expressing, in a plant, one ormore of the nucleic acid sequences in FIG. 1 or 2 (or sequences thathybridize thereto) shown to alter the production of acids in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of fatty acids in plants. Examples of fatty acids that can bealtered according to the present invention include, but are not limitedto, tetradecanoic acid, hexadecanoic acid, heptadecanoic acid,octadecanoic acid, eicosanoic acid, docosanoic acid, 9-hexadecenoicacid, 6-octadecenoic acid, 9-octadecanoic acid, 7,10-hexadecadienoicacid, 9,12-octadecadienoic acid, 9,12,15-octadecatrienoic acid, and7,10,13-docosatrienic acid. The alterations in metabolic profiles arepreferably accomplished by expressing, in a plant, one or more of thenucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of fatty acids in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of fatty acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of branched fatty acids in plants. Examples of branched fattyacids that can be altered according to the present invention include,but are not limited to, 14-methylhexadecanoic acid,16-methylheptadecanoic acid, 17-methylheptadecanoic acid,20-methylheneicosanoic acid, and 3,7,11,15-tetramethylhexadecanoic acid.The alterations in metabolic profiles are preferably accomplished byexpressing, in a plant, one or more of the nucleic acid sequences inFIG. 1 or 2 (or sequences that hybridize thereto) shown to alter theproduction of branched fatty acids in plants. In preferred embodiments,expression in plants of the sequences that hybridize to the precedingsequences also results in an increase, decrease, or alteration ofbranched fatty acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alcohols in plants. Examples of alcohols that can bealtered according to the present invention include, but are not limitedto, octadecanol, phytol, valereneol, and sugar alcohols, such as,inositol and mannitol. The alterations in metabolic profiles arepreferably accomplished by expressing, in a plant, one or more of thenucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of alcohols in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of alcohol production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alkaloids and other bases in plants. Examples of alkaloidsand other bases that can be altered according to the present inventioninclude, but are not limited to, nicotine, nornicotine, and1,4-butanediamine. The alterations in metabolic profiles are preferablyaccomplished by expressing, in a plant, one or more of the nucleic acidsequences in FIG. 1 or 2 (or sequences that hybridize thereto) shown toalter the production of alkaloids (or other bases) in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of alkaloid and other base production in aplant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alkenes and alkynes (unsaturated hydrocarbons) in plants.Examples of alkenes and alkynes that can be altered according to thepresent invention include, but are not limited to, limonene,dimethylcyclooctadiene, 4-methyldecene, eicosene, tetramethylhexadecene,dehydroisolongifolene, and squalene. The alterations in metabolicprofiles are preferably accomplished by expressing, in a plant, one ormore of the nucleic acid sequences in FIG. 1 or 2 (or sequences thathybridize thereto) shown to alter the production of alkenes and alkynesin plants. In preferred embodiments, expression in plants of thesequences that hybridize to the preceding sequences also results in anincrease, decrease, or alteration of alkene and alkyne production in aplant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of amino acids and related compounds in plants. Examples ofamino acids and related compounds that can be altered according to thepresent invention include, but are not limited to, proline, glycine,alanine, serine, aspartic acid, glutamic acid, lysine, tyrosine,phenylalanine, valine, threonine, arginine, glutamine, tryptophan,isoleucine, and 5-oxo-proline. The alterations in metabolic profiles arepreferably accomplished by expressing, in a plant, one or more of thenucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of amino acids in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of amino acid and related compounds productionin a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of carbohydrates in plants. Examples of carbohydrates thatcan be altered according to the present invention include, but are notlimited to, arabinose, xylose, glucose, fructose, galactose, fructose,mannose, rhamnose, and sucrose. The alterations in metabolic profilesare preferably accomplished by expressing, in a plant, one or more ofthe nucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of carbohydrates in plants. Inpreferred embodiments, expression of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof carbohydrate production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of esters in plants. Examples of esters that can be alteredaccording to the present invention include, but are not limited to,acylates, such as 2-steroyl-1-acetylglycerol and2-eicosanoyl-1-acetylglycerol. The alterations in metabolic profiles arepreferably accomplished by expressing, in a plant, one or more of thenucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of esters in plants. In preferredembodiments, expression in plants of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof ester production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of glycerides in plants. Examples of glycerides that can bealtered according to the present invention include, but are not limitedto, glycerol palmitate, and glycerol linoleate, and glyceryl linolenate.The alterations in metabolic profiles are preferably accomplished byexpressing, in a plant, one or more of the nucleic acid sequences inFIG. 1 or 2 (or sequences that hybridize thereto) shown to alter theproduction of glycerides in plants. In preferred embodiments, expressionin plants of the sequences that hybridize to the preceding sequencesalso results in an increase, decrease, or alteration of glycerideproduction in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of hydrocarbons (saturated) in plants. Examples ofhydrocarbons that can be altered according to the present inventioninclude, but are not limited to, eicosane, hentriacontane,2-methyloctacosane, 3-methylnonacosane, 2-methyltriacontane,3-methylhentriacontane, and 2-methyldotriacontane. The alterations inmetabolic profiles are preferably accomplished by expressing, in aplant, one or more of the nucleic acid sequences in FIG. 1 or 2 (orsequences that hybridize thereto) shown to alter the production ofhydrocarbons in plants. In preferred embodiments, expression in plantsof the sequences that hybridize to the preceding sequences also resultsin an increase, decrease, or alteration of hydrocarbon production in aplant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of phenols and related compounds in plants. Examples ofphenols and related compounds that can be altered according to thepresent invention include, but are not limited to, caffeic acid andchlorogenic acid. The alterations in metabolic profiles are preferablyaccomplished by expressing, in a plant, one or more of the nucleic acidsequences in FIG. 1 or 2 (or sequences that hybridize thereto) shown toalter the production of phenols (and related compounds) in plants. Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of phenol and related compounds in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of sterols, oxygenated terpenes, and other isoprenoids inplants. Examples of sterols, oxygenated terpenes, and other isoprenoidsthat can be altered according to the present invention include, but arenot limited to, solanesol, cycloartenol, alpha-tocopherol,alpha-tocopherol quinone, beta-tocopherol, gamma-tocopherol,stigmastenol, cycloartenol, stigmastatriene, campesterol, cholesterol,sitosterol, stigmasterol, methylene-lophenol, methylene-cycloartenol,dimethylergostadienol, fucosterol, ergostenone, fucosterol,stigmastadienol, and lanosterol. The alterations in metabolic profilesare preferably accomplished by expressing, in a plant, one or more ofthe nucleic acid sequences in FIG. 1 or 2 (or sequences that hybridizethereto) shown to alter the production of sterols, oxygenated terpenes,or other isoprenoids, in plants. In preferred embodiments, expression inplants of the sequences that hybridize to the preceding sequences alsoresults in an increase, decrease, or alteration of sterol, oxygenatedterpene, and other isoprenoid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of ketones and quinones in plants. Examples of ketones andquinones that can be altered according to the present invention include,but are not limited to, 3-phytolmenadione and alpha-tocopherol quinone.The alterations in metabolic profiles are preferably accomplished byexpressing, in a plant, one or more of the nucleic acid sequences inFIG. 1 or 2 (or sequences that hybridize thereto) shown to alter theproduction of ketones and quinones. In preferred embodiments, expressionin plants of the sequences that hybridize to the preceding sequencesalso results in an increase, decrease, or alteration of ketone andquinone production in a plant.

VII. Identification of Homologs to Sequences

The present invention also provides homologs and variants of thesequences described above, but which may not hybridize to the sequencesdescribed above under conditions ranging from low to high stringency. Insome preferred embodiments, the homologous and variant sequences areoperably linked to an exogenous promoter. FIG. 3 provides BLASTX searchresults from publicly available databases. The relevant sequences areidentified by Accession numbers in these databases. FIG. 4 contains thetop BLASTX hits (identified by Accession number) versus all the aminoacid sequences in the Derwent™ biweekly database. FIG. 5 contains thetop BLASTN hits (identified by Accession number) versus all thenucleotide sequences in the Derwent™ biweekly database.

In some embodiments, the present invention comprises homologous nucleicacid sequences (SEQ ID NOs:1166-3702 and 4154-7388) identified byscreening an internal database with SEQ ID NOs.1-1165, 3703-4153 and7389-7554 at a confidence level of Pz<1.00E-20. These sequences areprovided in FIG. 2. The headers list the sequence identifier for thesequence that produced the actual phenotypic hit first and the sequenceidentifier for the homologous contig second.

As will be understood by those skilled in the art, the present inventionis not limited to the particular sequences of the homologs describedabove. Indeed, the present invention encompasses portions, fragments,and variants of the homologs as described above. Such variants,portions, and fragments can be produced and identified as described inSection III above. In particularly preferred embodiments, the presentinvention provides sequences that hybridize to SEQ ID NOs:1166-3702 and4154-7388 under conditions ranging from low to high stringency. In otherpreferred embodiments, the present invention provides nucleic acidsequences that inhibit the binding of SEQ ID NOs:1166-3702 and 4154-7388to their complements under conditions ranging from low to highstringency. Furthermore, as described above in Section IV, the homologscan be incorporated into vectors for expression in a variety of hosts,including transgenic plants.

Homolog contigs, FIG. 2 (as described in Example 16, Section D:Identification of Homologous Sequences) are formed from individualsequence runs belonging to clones whose sequences share a predefinedlevel of nucleotide identity to each other and are presumablyindependent isolates of a single gene sequence. This list of clonescomposing any one homolog contig are the actual entities that arescreened. If clones sharing homology to a particular hit sequence, FIG.1, perform a very similar or identical function within the organism,then these clones should also result in metabolic alterations whentested in the metabolic screen. The data contained in FIG. 9 areprovided as examples of the metabolic phenotype correlation betweenhomolog clones. FIG. 9 shows the correlation between the homologsequence pairs and the metabolic alterations observed in this invention.Biochemicals, common to both clones, are listed with the correspondingchemical and biochemical classes. The alterations, up-regulated ordown-regulated, observed for these biochemicals are also reported.

EXAMPLES Example 1 Construction of Tissue-Specific N. benthamiana cDNALibraries

A. mRNA Isolation: Leaf, root, flower, meristem, and pathogen-challengedleaf cDNA libraries were constructed. Total RNA samples from 10.5 μg ofthe above tissues were isolated by TRIZOL reagent (Life Technologies,Inc.; Rockville, Md.). The typical yield of total RNA was 1 mg PolyA⁺RNAand was purified from total RNA by DYNABEADS oligo (T)₂₅. Purified mRNAwas quantified by UV absorbance at OD₂₆₀ The typical yield of mRNA was2% of total RNA. The purity was also determined by the ratio ofOD₂₆₀/OD₂₈₀. The integrity of the samples had OD values of 1.8-2.0.

B. cDNA Synthesis: cDNA was synthesized from mRNA using the SUPERSCRIPTplasmid system (Life Technologies, Inc.; Rockville, Md.) with cloningsites of NotI at the 3′ end and SalI at the 5′ end. After fractionationthrough a gel column to eliminate adapter fragments and short sequences,cDNA was cloned into both GENEWARE™ vector p1057 NP and phagemid vectorPSPORT in the multiple cloning region between NotI and XhoI sites. Over20,000 recombinants were obtained for all of the tissue-specificlibraries.

C. Library Analysis: The quality of the libraries was evaluated bychecking the insert size and percentage from representative 24 clones.Overall, the average insert size was above 1 kb, and the recombinantpercentage was >95%.

Example 2 Construction of Normalized N. benthamiana cDNA Library inGENEWARE™ Vectors

A. cDNA synthesis. A pooled RNA source from the tissues described abovewas used to construct a normalized cDNA library. Total RNA samples werepooled in equal amounts first, then polyA+RNA was isolated by DYNABEADSoligo (dT)₂₅. The first strand cDNA was synthesized by the Smart IIIsystem (Clontech; Palo Alto, Calif.). During the synthesis, adaptersequences with Sfi1a and Sfi1b sites were introduced by the polyApriming at the 3′ end and 5′ end by the template switch mechanism(Clontech; Palo Alto, Calif.). Eight μg first strand cDNA wassynthesized from 24 μg mRNA. The yield and size were determined by UVabsorbance and agarose gel electrophoresis.

B. Construction of Genomic DNA driver. Genomic DNA driver wasconstructed by immobilizing biotinylated DNA fragments ontostreptavidin-coated magnetic beads. Fifty μg genomic DNA was digested byEcoR1 and BamH1 followed by fill-in reaction using biotin-21-dUTP. Thebiotinylated fragments were denatured by boiling and immobilized ontoDYNABEADS by the conjugation of streptavidin and biotin.

C. Normalization Procedure. Six μg of the first strand cDNA washybridized to 1 μg of genomic DNA driver in 100 μl of hybridizationbuffer (6×SSC, 0.1% SDS, 1× Denhardt's buffer) for 48 hours at 65° C.with constant rotation. After hybridization, the cDNA bound on genomicDNA beads was washed 3 times by 20 μl 1×SSC/0.1% SDS at 65° C. for 15min and one time by 0.1×SSC at room temperature. The cDNA bound to thebeads was then eluted in 10 μl of fresh-made 0.1N NaOH from the beadsand purified by using a QIAGEN DNA purification column (QIAGEN GmbH;Hilden, Germany), which yielded 110 ng of normalized cDNA fragments. Thenormalized first strand cDNA was converted to double strand cDNA in 4cycles of PCR with Smart primers annealed to the 3′ and 5′ end adaptersequences.

D. Evaluation of normalization efficiency. Ninety-six non-redundant cDNAclones selected from a randomly sequenced pool of 500 clones of apreviously constructed whole seedling library were used to construct anylon array. One hundred ng of the normalized cDNA fragments versus thenon-normalized fragments were radioactively labeled by ³²P andhybridized to DNA array nylon filters. The hybridization images andintensity data were acquired by a PHOSPHORIMAGER (Amersham PharmaciaBiotech; Chicago, Ill.). Since the 96 clones on the nylon arraysrepresent different abundance classes of genes, the variance ofhybridization intensity among these genes on the filter were measured bystandard deviation before and after normalization. The results indicatedthat by using this type of normalization approach, a 1000-fold reductionin variance among this set of genes could be achieved.

E. Cloning of normalized cDNA into GENEWARE™ vector. The normalized cDNAfragments were digested by Sfi1 endonuclease, which recognizes 8-bpsites with variable sequences in the middle 4 nucleotides. After sizefractionation, the cDNA was ligated into GENEWARE™ vector p1057 NP inantisense orientation and transformed into DH5α cells. Over 50,000recombinants were obtained for this normalized library. The percentageof insert and size were evaluated by Sfi digestion of randomly picked 96clones followed by electrophoresis on 1% of agarose gel. The averageinsert size was 1.5 kb, and the percentage of insert was 98% with vectoronly insertions of >2%.

F Sequence analysis of normalized cDNA library. Two plates of 96randomly picked clones have been sequenced from the 5′ end of cDNAinserts. One hundred ninety-two quality sequences were obtained aftertrimming of vector sequences and other standard quality checking andfiltering procedure, and subjected to BLASTX search in DNA and proteindatabases. Over 40% of these sequences had no hit in the databases.Clustering analysis was conducted based on accession numbers of BLASTXmatches among the 112 sequences that had hits in the databases. Onlythree genes (tumor-related protein, citrin, and rubit) appeared twice.All other members in this group appeared only once. This was a strongindication that this library is well-normalized. Sequence analysis alsorevealed that 68% of these 192 sequences had putative open readingframes using the ORF finder program (as described above), indicatingpossible full-length cDNA.

Example 3 Rice cDNA Library Construction in GENEWARE™ Vectors

Oryzae sativa var. Indica IR-7 was grown in greenhouses under standardconditions (12/12 photoperiod, 29° C. daytime temp., 24° C. nighttemp.). The following types of tissue were harvested, immediately frozenon dry ice and stored at −80° C.: young leaves (20 days post sowing),mature leaves and panicles (122 days post sowing). Mature and immatureroot tissue (either 122 or 20 days post sowing) was harvested, rinsed inddH₂O to remove soil, frozen on dry ice and stored at −80° C.

The following standard method (Life Technologies) was used forgeneration of cDNA and cloning. High quality total RNA was purified fromtarget tissues using Trizol (LTI) reagent. mRNA was purified by bindingto oligo (dT) and subsequent elution. Quality of mRNA samples isessential to cDNA library construction and was monitoredspectrophotmetrically and via gel electrophoresis. 2-5 μg of mRNA wasprimed with an oligo (dT)-NotI primer and cDNA was synthesized (noisotope was used in cDNA synthesis). SalI adaptors were ligated to thecDNA, which was then subjected to digestion with NotI. Restrictionfragments were fractionated based on size and the first 10 fractionswere measured for DNA quantity and quality. Fractions 6 to 9 were usedfor ligations. 100 ng of GENEWARE™ vector was ligated to 20 ngsynthesized cDNA. Following ligations, the mixtures were kept at −20° C.For transformation, 1 μl to 10 μl ligation reaction mixture was added to100 μl of competent E. coli cells (strain DH5α) and transformed usingthe heat shock method. After transformation, 900 μl SOC medium was addedto the culture and it was incubated at 37° C. for 60 minutes.Transformation reactions were plated out on 22×22 cm LB/Amp agar platesand incubated overnight at 37° C.

Example 4 Poppy cDNA Library Construction in GENEWARE™ Vectors

A. Plant Growth. A wild population of Papaver rhoeas resistant to auxin2,4-Dichlorophenoxyacetic acid (2,4-D) was identified from a location inSpain and seed was collected. The seed was germinated and yielded amorphologically heterogeneous population. Leaf shape varied from deeplyto shallowly indented. Latex color in some individuals was pure whitewhen freshly cut, slowly changing to light orange then brown. Latex inother individuals was bright yellow or orange and rapidly changed todark brown upon exposure to air. A single plant (PR4) with the whitelatex phenotype was used to generate the library.

B. RNA extraction. Approximately 1.5 g of leaves and stems werecollected and frozen on liquid nitrogen. The tissue was ground to a finepowder and transferred to a 50 mL conical polypropylene screw capcentrifuge tube. Ten mL of TRIZOL reagent (Life Technologies, Inc.;Rockville, Md.) was added and vortexed at high speed for several minutesof short intervals until an aqueous mixture was attained. Two mL ofchloroform was added and the suspension was again vortexed at high speedfor several minutes. The tube was centrifuged 15 minutes at 3100 rpm ina tabletop centrifuge (GP Centrifuge, Beckman Coulter, Inc; Fullerton,Calif.) for resolution of the phases. The aqueous supernatant was thencarefully transferred to diethylpyrocarbonate (DEPC)-treated 1.5 mLmicrotubes and total RNA was precipitated with 0.6 volumes ofisopropanol. To facilitate precipitation, the solution was allowed tostand 10 minutes at room temperature after thorough mixing. Followingcentrifugation for 10 minutes at 8000 rpm in a microcentrifuge (model5415C, Eppendorf AG, Hamburg), the pellet of total RNA was washed with70% ethanol, briefly dried and resuspended in 200 μL DEPC-treateddeionized water. A 10 μL aliquot was examined by non-denaturing agarosegel electrophoresis.

C. cDNA synthesis. To generate cDNA, approximately 50 μg of total RNAwas primed with 250 pmole of first strand oligo (TAIL:5′-GAG-GAT-GTT-AAT-TAA-GCG-GCC-GCT-GCA-G(T)₂₃-3′)(SEQ ID NO:7555) in avolume of 250 μL using 1000 units of Superscript reverse transcriptase(Life Technologies, Inc.; Rockville, Md.) for 90 minutes at 42° C.Phenol extraction was performed by adding an equal volume ofphenol:chloroform:isoamyl alcohol (25:24:1 v/v), vortexing thoroughly,and centrifuging 5 minutes at 14,000 rpm in an Eppendorf microfuge. Theaqueous supernatant phase was transferred to a fresh microfuge tube andthe first strand cDNA:mRNA hybrids were precipitated with ethanol byadding 0.1 volume of 3 M sodium acetate and 2 volumes of absoluteethanol. After 5 minutes at room temperature, the tube was centrifuged15 minutes at 14,000 rpm. The pellet was washed with 80% ethanol, driedbriefly and resuspended in 100 μL TE buffer (10 mM TrisCl, 1 mM EDTA, pH8.0). After adding 10 μL Klenow buffer (RE buffer 2, Life Technologies,Inc.; Rockville, Md.) and dNTPs (Life Technologies, Inc.; Rockville,Md.) to a final concentration of I MM, second strand cDNA was generatedby adding 10 units of Klenow enzyme (Life Technologies, Inc.; Rockville,Md.), 2 units of RNase H (Life Technologies, Inc.; Rockville, Md.) andincubating at 37° C. for 2 hrs. The buffer was adjusted withβ-nicotinamide adenine dinucleotide β-NAD) by addition of E. coli ligasebuffer (Life Technologies, Inc.; Rockville, Md.) and adenosinetriphosphate (ATP, Sigma Chemical Company, St. Louis, Mo.) added to afinal concentration of 0.6 mM. Double stranded phosphorylated cDNA wasgenerated by addition of 10 units of E. coli DNA ligase (LifeTechnologies, Inc.; Rockville, Md.), 10 units of T4 polynucleotidekinase (Life Technologies, Inc.; Rockville, Md.) and incubating for 20minutes at ambient temperature.

The double stranded cDNA was isolated through phenol extraction andethanol precipitation, as described above. The pellet was washed with80% ethanol, dried briefly and resuspended in a minimal volume of TE.The resuspended pellet was ligated overnight at 16° C. with 50 pmole ofkinased AP3-AP4 adapter (AP-3:5′-GAT-CTT-AAT-TAA-GTC-GAC-GAA-TTC-3′/AP-4:5′-GAA-TTC-GTCGAC-TTA-ATT-AA-3 ′)(SEQ ID NOs:7556-7557) and 2 units ofT4 DNA ligase (Life Technologies, Inc.; Rockville, Md.). Ligationproducts were amplified by 20 cycles of PCR using AP-3 primer andexamined by agarose gel electrophoresis.

Expanded adapter-ligated cDNA was digested overnight at 37° C. with PacIand NotI restriction endonucleases. The GENEWARE™ vector pBSG1056 (LargeScale Biology Corporation, Vacaville, Calif.) was similarly treated.Digested cDNA and vector were electrophoresed a short distance throughlow-melting temperature agarose. After visualizing with ethidium bromideand excising the appropriate fraction(s), the fragments were thenisolated by melting the agarose and quickly diluting 5: I with TE bufferto keep from solidifying. The diluted fractions were mixed in theappropriate ratio (approximately 10:1 vector:insert ratio) and ligatedovernight at 16° C. using T4 DNA ligase. Characterization of theligation revealed an average insert size of 1.27 kb. The ligation wastransferred to LSBC, Inc. where large scale arraying was carried out.Random sequencing of nearly 100 clones indicated that about 40% of theinserts had full length open frames.

Example 5 ABRC Library Construction in GENEWARE™ Expression Vectors

Expressed sequence tag (EST) clones were obtained from the ArabidopsisBiological Resource Center (ABRC; The Ohio State University, Columbus,Ohio 43210). These clones originated from Michigan State University(from the labs of Dr. Thomas Newman of the DOE Plant Research Laboratoryand Dr. Chris Somerville, Carnegie Institution of Washington) and fromthe Centre National de la Recherche Scientifique Project (CNRS project;donated by the Groupement De Recherche 1003, Centre National de laRecherche Scientifique, Dr. Bernard Lescure et al.). The clones werederived from cDNA libraries isolated from various tissues of Arabidopsisthaliana var Columbia. A clone set of 11,982 clones was received asglycerol stocks arrayed in 96 well plates, each with an ABRC identifierand associated EST sequence.

An ORF finding algorithm was performed on the EST clone set to findpotential full-length genes. Approximately 3,200 full-length genes werefound and used to make GENEWARE™ constructs in the sense orientation.Five thousand of the remaining clones (not full-length) were used tomake GENEWARE™ constructs in the antisense orientation.

Full-length clones used to make constructs in the sense orientation weregrown and DNA was isolated using Qiagen (Qiagen Inc.; Valencia, Calif.91355) mini-preps. Each clone was digested with NotI and Sse 8387 eightbase pair enzymes. The resultant fragments were individually isolatedand then combined. The combined fragments were ligated into pGTN P/Nvector (with polylinker extending from PstI to NotI—5′ to 3′). For eachset of 96 original clones approximately 192 colonies were picked fromthe pooled GENEWARE™ ligations, grown until confluent in deep-well96-well plates, DNA prepped and sequenced. The ESTs matching the ABRCdata was bioinformatically checked by BLAST and a list of missing cloneswas generated. Pools of clones found to be missing were prepared andsubjected to the same process. The entire process resulted in greaterthan 3,000 full-length sense clones.

The negative sense clones were processed in the same manner, but ligatedinto pGTN N/P vector (with polylinker extending from NotI to PstI—5′ to3′). For each set of 96 original clones approximately 192 colonies werepicked from the pooled GENEWARE™ ligations and DNA prepped. The DNA fromthe GENEWARE™ ligations was subjected to RFLP analysis using TaqI 4 basecutter. Novel patterns were identified for each set. The RFLP method wasapplied and only applicable for comparison within a single ABRC plate.This procedure resulted in greater than 6,000 negative sense clones.

The identified clones were re-arrayed, transcribed, encapsidated andused to inoculate plants.

Example 6 Regulatory Factors cDNA Library Construction in GENEWARE™Vectors

Transcription factors represent a class of genes that regulate andcontrol many aspects of plant physiology, including growth, development,metabolism and response to the environment. In order to analyze acollection of regulatory factor genes, the PCR-based methods describedbelow were used to construct a library of such genes from Arabidopsisthaliana and Saccharomyces cerevisiae. In addition, clones containinggenes corresponding to regulatory factors from N. benthamiana, Oryzaesativa and Papaver rhoeas were selected, based on cDNA sequence, fromthe libraries generated in GENEWARE™ vectors as described above.

A. Regulatory Factor Gene Targeting. Publicly accessible databases ofgenome sequence include data on a wide range of organisms, from microbesto human. Many of these databases include annotation along with genesequences that predict function of the genes based on eitherexperimental data or homology to characterized genes. The MIPS (MunichInformation Center for Protein Sequences) database contains sequenceinformation and annotation for both Arabidopsis thaliana andSaccharomyces cerevisiae genomes. Based on this annotation, open readingframe sequences of predicted yeast and Arabidopsis transcription factorswere downloaded from MIPS and used for PCR primer design.

B. PCR Primer Design

18-20 base pairs of nucleotide sequences at both ends of each downloadedORF were extracted and used to design the gene-specific portion ofindividual primers. In addition, flanking sequence and restriction siteswere added to the ends of primers as shown in the following example:

SEQ ID NO:7558 5′ primer GCCTTAATTAACTGCAGC atgtcgggtcgtgaagatgaag    PacI  -------              PstI  5′ gene-specific sequence SEQ IDNO:7559 3′ primer TTGATATCTAGAGCGGCCGCTTA tcatgtttcatcatcgaaatcatca   EcoRV     NotI             ------      3′ gene-specific sequence              XbaI

C. Arabidopsis and Yeast Template Preparation. Total RNA was isolatedfrom flowers and apical meristems of the Arabidopsis ecotype Columbiausing the Qiagen RNA-easy kit (Cat. no. 75162). mRNA was subsequentlyisolated from total RNA using the MACS mRNA isolation kit from MiltenylBiotec (cat. no. 751-02). First strand cDNA was synthesized from 10 μgof mRNA in the presence of Superscript II reverse transcriptase (GibcoBRL cDNA synthesis kit; cat. no. 18248-013) and NotI primer(5′-GACTAGTTCTAGATCGCGAGCGGCCGCCC(T)₃₀VN-3′)(SEQ ID NO:7560). The secondstrand was synthesized based on the manufacturers instructions. ThiscDNA was diluted 1:5 prior to DNA amplification.

Since most yeast genes do not contain introns, genomic DNA was useddirectly as a template for PCR. Genomic DNA from S. cerevisiae S288C wasobtained directly from Research Genetics (ResGen, an Invitrogen company,Huntsville Ala., catalog #40802).

D. PCR Amplification. 1 μl of template DNA was subjected to PCR usingthe Hi Fi Platinum (hot start) DNA polymerase (Gibco-BRL cat. no.11304-011) and gene-specific primers for each ORF. Each 50 μl reactioncontained: 5 μl 10× buffer, 1 μl of 10 mM dNTP, 2 μl of 50 mM MgSO₄, 1μl of template cDNA, 10 pmoles of each primer and 0.2 unit of PlatinumHi Fi DNA polymerase. PCR reactions were carried out in a MJ Research(Model PTC 200) thermal cycler programmed with the following conditions:

-   -   3 min at 95° C.    -   30 cycles [95° C. 30 sec., 50° C. 30 sec., 72° C. 3 min.]    -   72° C. 10 min.        Following PCR, reactions were stored at −20° C. until ready for        ligation.

D. Subcloning ORFs into GENEWARE™ Vectors. To minimize cost and thelabor involved in cloning of individual ORF, PCR products containingdifferent ORFs were cloned into the GENEWARE™ vectors as pooled DNAs.30-75 PCR products were pooled, digested with PacI and NotI and purifiedfrom an agarose gel. Purified DNA was subsequently ligated into theGENEWARE™ vector (5PN-Cap digested with PacI and NotI). Single colonieswere selected, grown and their DNA analyzed for the presence of insert.Inserts were gel purified and sequenced, and the sequence compared tothe MIP protein database to confirm that they covered the complete ORF.Unique sequences representing various related genes were selected tocover different genes within a multi-gene family. The efficiency ofpooled cloning ranged from 30-50% (i.e., 30-50 clones were identifiedfrom analysis of 100 pooled PCR products). Following sequenceidentification of the clones, PCR products that were not represented inthe first round of cloning were subsequently pooled together andsubjected to a second round.

Example 7 Other Libraries: Regulatory Gene Selection

For each of the cDNA libraries generated from N. benthamiana, Oryzaesativa and Papaver Rhoeas, a unigene set of clones was established.Following basic library construction, all DNA sequences were subjectedto BLASTN analysis against each other. Sequences that showed perfecthomology across a minimum of 50 base pairs were clustered together. Atthis level each cluster putatively represents a unique gene. The size ofcluster varies depends on the size and complexity of sequence population(sequenced library). A cluster may have only one sequence member, orconsist of hundreds of member sequences. The clone with 5′-most sequencein a cluster was then selected to represent the gene. A collection ofall the 5′-most sequences or clones was established as the unigene setfor that particular library. In the example illustrated below, 4 ESTsequences were clustered, representing a putative gene. The EST Seq Icontained the most sequence information toward the 5′-end, indicatingthat this clone had the longest insert relative to other clustermembers. This process allows removal of redundant clones and selectionof the longest and most-likely full-length clones for subsequentscreens.

Based on the analysis of the sequence, and annotations of each unigenefrom each library, all clones that were homologous to known regulatorygenes/transcription factors were targets for selection. Depending on thelevel of homology, some of the clones represented well characterizedregulatory genes; however, many of the selected clones had only a modestlevel of homology to known genes or genes of very distantly relatedorganisms. It is believed that this selection process can increase theprobability of gene discovery, and by eliminating non-relevant clones,increase screen efficiency.

Example 8 Trichoderma cDNA Library Construction in GENEWARE™ Vectors

A. Growth and Induction of Trichoderma harzianum rifai 1295-22. Culturesof Trichoderma harzianum rifai 1295-22 were obtained from ATCC (cat.#20847) and propagated on PDA. Liquid cultures were inoculated andinduced using a protocol derived from Vasseur et al. (Microbiology141:767-774, 1995) and Cortes et al. (Mol. Gen. Genet. 260:218-225,1998): agar-grown cells were used to inoculate a 100 mL culture in PDBand grown 48 hours at 29° C. with agitation. Mycelia were harvested bycentrifugation, transferred to Minimal Media (MM) +0.2% glucose, andincubated overnight at 29° C. with agitation. Mycelia were harvestedagain by centrifugation, washed with MM, resuspended in MM and incubated2 hours at 29° C. with agitation. Mycelia were harvested again bycentrifugation, divided into 2 aliquots, and used to inoculate 1)125 mLMM+0.2% glucose or 2) 125 mL MM +lmg/mL elicitor. Elicitor is apreparation of cell walls from Rhizoctonia solani grown in liquidculture and isolated according to Goldman et al. (Mol. Gen. Genet.234:481-488, 1992). Induced and uninduced cultures were incubated at 29°C. with agitation, harvested after 24 and 48 hours by filtration andimmediately frozen in liquid nitrogen. Aliquots were assayed forinduction using 2-D gel SDS-PAGE to compare induced and uninducedcultures. Both induced and uninduced (24 hours) tissue was used forsubsequent RNA isolation and library construction.

B. RNA Isolation and Library Construction. mRNA isolation wasaccomplished by magnetically labeling polyA⁺RNA with oligo (dT)microbeads and selecting the magnetically labeled RNA over a column. Thepurified polyA⁺RNA was then used for cDNA synthesis using a modifiedversion of the full-length enrichment reactions (cap-capture method)described by Seki et al. (Plant J. 15:707-720, 1998). Specifically,isolated mRNA was primed with NotI-oligo d(T) primer to synthesize thefirst strand cDNA. After the synthesis reaction, a biotin group waschemically introduced to the diol residue of the cap structure of themRNA molecule. RNase I treatment was then used to digest the mRNA/cDNAhybrids, followed by binding of streptavidin magnetic beads. After thisstep, the full-length cDNAs were then removed from the beads by RNaseHand tailed with oligo dG by terminal transferase or used directly in the2^(nd) strand synthesis. For the oligo dG tailed samples, the secondstrand cDNAs were then synthesized with PacI-oligo dC primers and DNApolymerase. Additional modifications to the published procedure include:addition of trehalose and BSA as enzyme stabilizers in the reversetranscriptase reaction, a temperature of 50 to 60° C. for the firststrand cDNA synthesis reaction, high stringency binding and washingconditions for capturing biotinylated cap-RNA/cDNA hybrids andsubstitution of the cDNA poly (dG) tailing step with a Sal-I linkerligation. The cDNA was size-fractionated over a column and the largest2-3 fractions were collected and used to ligate with GENEWARE™ vectorpBSG1057. The ligation reaction was transformed into E. coli DH5α andplated, the transformation efficiency was calculated and the DNA fromthe transformants was subjected to the quality control steps describedbelow:

-   1. cDNA synthesis/cloning: The cloning efficiency must be greater    than 8×10⁵ cfu/μg.-   2. Restriction enzyme digestion and sequencing: 500 to 1,000    transformants were picked and DNA isolated. cDNA inserts were    digested out by appropriate restriction enzymes and checked by gel    electrophoresis. The average insert size was calculated from 100    random clones. If the average size was >0.9 kbp, the DNA preps were    then passed on to the sequencing group to obtain 5′-end sequences.    Those sequences were used to further evaluate the of the library.    Libraries that did not meet QC standards, such as high vector    background (>5%), low full-length percentage (<60%), or short    average insert size (<0.7kbp), were discarded, and the entire    procedure repeated.

C. Library Subtraction. The induced Trichoderma library in GENEWARE™ wasconstructed as above and a large number of clones were arrayed on anylon membrane at high density (HD array). Based on the genomic size andexpression levels of S. cerevisiae, 18,000 colonies were imprinted toprovide 3-fold coverage of the expressed genes. Freshly grown colonieswere plated out and picked into 384 well plates and then imprinted onNylon membranes in 3×3 format at duplicated locations. First strandcDNAs to use as probes were synthesized from mRNAs isolated from bothinduced and uninduced tissue and used to hybridize the HD arrays. Theintensity of each clone after hybridization was quantitated byphosphoimage scanning. The locations of all 18,000 spots were tracked byArray Vision software, which also determined the local background andcalculated the signal/noise ratio for every clone on the membrane. Thedata generated were then converted to Excel format and analyzed toobtain the fold of induction or down-regulation. Based on the measurednoise level, a 5-fold increase or decrease, relative to controls, wasused as a cutoff value. Clones displaying ≧5-fold induction or reductionon duplicated samples were chosen. These clones were roboticallyre-arrayed using a Qbot device (see below, Colony Array) DNA was preppedas described below and sequenced. Based on the clustering results,5′-most unigenes were selected and re-arrayed using the proceduresdescribed for the Poppy library above: the total number of clones thatwere selected was 1,019 for the up-regulated library (Th03), and 851 forthe down-regulated library (Th04). These clones were prepared asdescribed below (DNA Preparation, Transcription, Inoculation) and testedin a functional genomic screens for modified visual phenotypes.

Example 9 Colony Array

A. Colony Array—Picking. Ligations were transformed into E. coli DH5αcells and plated onto 22×22 cm Genetix “Q Trays” prepared with 200 mLagar, Amp¹⁰⁰. A Qbot device (Genetix, Inc., Christchurch, Dorset UK)fitted with a 96 pin picking head was used to pick and transfer desiredcolonies into 384-well plates according to the manufacturersspecifications and picking program SB384.SC1, with the followingparameters:

Source

Container: Genetix bioassay tray

Color: White

Agar Volume: 200 mL

Destination

Container: Hotel (9 High)

Plate: Genetix 384 well plate

Time In Wells (sec): 2

Max Plates to use: # of 384 well plates

1^(st) Plate: 1

Dips to Inoculate: 10

Well Offset: 1

Head

Head: 96 Pin Picking Head

First Picking Pin: 1

Pin Order: A1-H1, H2-A2 . . . (snaking)

Sterilizing

Qbot Bath #1

Bath Cycles: 4

Seconds in Dryer: 10

Wait After Drying: 10

(approximate picking time: 8 hrs /20,000 colonies)

Following picking, 384 well plates containing bacterial inoculum weregrown in a HiGro chamber fitted with O₂ at 30° C., speed 6.5 for 12-14hours. Following growth, plates were replicated using the Qbot with thefollowing parameters, 2 replication runs per plate:

Source

Container: Hotel (9 High)

Plate: Genetix Plate 384 Well

Plates to replicate: 24

Start plate No.: 1

No. of copies: 1

Destination

Container: Universal Dest Plate Holder

Plate: Genetix Plate 384 Well

No. of Dips: 5

Head

Head: 384 Pin Gravity Gridding Head

Sterilizing

Qbot Bath #1

Bath cycles: 4

Seconds in Dryer: 10

Wait After Drying: 10

Airpore tape was placed over the replicated 384 well plates and thereplicated plates were grown in the HiGro as above for 18-20 hours,sealed with foil tapes and stored at −80° C.

B. Colony Array—Gridding. Membrane filters were soaked in LB/Ampicillinfor 10 minutes. Filters were aligned onto fresh 22×22 cm agar plates andallowed to dry on the plates 30 min. in a Laminar flowhood. Plates andfilters were placed in the Qbot and UV sterilized for 20 minutes.Following sterilization, plates/filters were gridded from 384 wellplates using the Qbot according to the manufacturers specifications withthe following parameters:

Gridding Routine

Name: 3×3

Source

Container: Hotel (9 High)

Plate: Genetix Plate 384 Well

Max Plates: 8

Inking time (ms): 1000

Destination

Filter holder: Qtray

Gridding Pattern: 3×3, non-duplicate, 8

Field Order: front 6 fields

No. Filters: up to 15

Max stamps per ink: 1

Max stamps per spot: 1

Stamp time (ms): 1000

No. Fields in Filter: 2

No. Identical Fields: 2

Stamps between sterilize: 1

Head: 384 pin gravity gridding head

Pin Height Adjustment: No change

Qbot Bath #1

Bath cycles: 4

Dry time: 10 (Seconds)

Wait After Drying: 10 (Seconds)

C. Plate Rearray. 384 well plates were rearrayed into deep 96 well blockformat using the Qbot according to the manufacturers instructions andthe following rearray parameters X2 per plate:

Source

Container: Hotel (9 High)

Plate: Genetix Plate 384 Well

1^(st) Plate: 1

Destination

Container: Universal Dest Plate Holder

Plate: Beckman 96 Deep Well Plate

1^(st) plate: 1

Dips to Inoculate: 5

Well offset: 1

Max plates to use: 12 (or less)

Time in wells (sec): 2

Qbot Bath #1

Head: 96 pin picking head

First Picking Pin: 1

Pin Order: A1-H1, A2-H2, A3-H3 . . .

Bath cycles: 4

Sec. In dryer: 10

Wait after drying: 10

Following rearray, the 96-well blocks were covered with airpore tape andplaced in incubator shakers at 37° C., 500 rpm for a total of 24 hours.Plates were removed and used for DNA preparation.

Example 10 DNA Preparation

Plasmid DNA was prepared in a 96-well block format using a QiagenBiorobot 9600 instrument (Qiagen; Valencia; Calif.) according to themanufacturers specifications. In this 96-well block format, 900 μl ofcell lysate was transferred to the Qiaprep filter and vacuumed 5 min. at600 mbar. Following this vacuum, the filter was discarded and theQiaprep Prep-Block was vacuumed for 2 min at 600 mbar. After addingbuffer, samples were centrifuged for 5 min at 600 rpm (Eppendorfbenchtop centrifuge fitted with 96-wp rotor) and subsequently washed ×2with PE. Elution was carried out for 1 minute, followed by a 5 min.centrifugation at 6000 rpm. Final volume of DNA product wasapproximately 75 μl.

Example 11 Generation of Raw Sequence Data and Filtering Protocols

High-throughput sequencing was carried out using the PCT200 and TETRADPCR machines (MJ Research; Watertown, Mass.) in 96-well plate format incombination with two ABI 377 automated DNA sequencers (PE Corporation;Norwalk, Conn.). The throughput at present is six 96-well plates perday. The quality of sequence data is improved by filtering the rawsequence output from sequencer. One criteria is to make sure that theunreadable bases are less than 10% of the total number of bases for anysequence and that there are no more than ten consecutive Ns in themiddle part of the sequence (40-450). The sequences that pass thesetests are defined as being of high quality. The second step forimproving the quality of a sequence is to remove the vectors from thesequence. There are two advantages of this process. First, when locatingthe vector sequence, its position can be used to align to the inputsequence. The quality of the sequence can be evaluated by the alignmentbetween the vector sequence and the target sequence. Second, the removalof the vector sequence greatly improves the signal-to-noise ratio andmakes the analysis of the resulting database search much easier. A thirdimportant pre-filtering step is to eliminate the duplicates in a libraryso it will speed up the analysis and reduce redundant analyses.

Example 12 Automated Transcriptions and Encapsidations

Plasmid DNA preparations were subjected to automated transcriptionreactions in a 96-well plate format using a Tecan Genesis AssayWorkstation 200 robotic liquid handling system (Tecan, Inc.; ResearchTriangle Park, N.C.) according to the manufacturers specifications,operating on the Gemini Software (Tecan, Inc.) program“Automated_Txns.gem. For these reactions, reagents from Ambion, Inc.(Austin, Tex.) were used according to the manufacturers specificationsat 0.4× reaction volumes. Following the robotic set-up of transcriptionreactions, 96-well plates were removed from the Tecan, shaken on aplatform shaker for 30 sec., centrifuged in an Eppendorf tabletopcentrifuge fitted with a 96-well plate rotor at 700 rcf for 1 minute andincubated at 37° C. for 1.5 hours.

During the transcription reaction incubation, encapsidation mixture wasprepared according to the following recipe:

1X Solution Sterile ddi H₂O 100.5 μL 1 M Sodium Phosphate  13.0 μL TMVCoat Protein (20 mg/mL)  6.5 μL   120 μL per wellThis mixture was placed in a reservoir of the Tecan and added to the96-well plates containing transcription reaction following theincubation period using Gemini software program “9_Plates.gem”. Afteradding encapsidation mixture, plates were shaken for 30 sec. on aplatform shaker, briefly centrifuged as described above, and incubatedat room temperature overnight. Prior to inoculation, encapsidatedtranscript was sampled and subjected to agarose gel analysis for QC.

Example 13 Infection of N. benthamiana Plants with GENEWARE™ ViralTranscripts Plant Growth

N. benthamiana seeds were sown in 6.5 cm pots filled with Redi-earthmedium (Scotts) that had been pre-wetted with fertilizer solution (147kg Peters Excel 15-5-15 Cal-Mag (The Scotts Company; Marysville, Ohio),68 kg Peters Excel 15-0-0 Cal-Lite, and 45 kg Peters Excel 10-0-0MagNitrate in 596L hot tap H₂O, injected (H. E. Anderson; Muskogee,Okla.) into irrigation water at a ratio of 200:1). Seeded pots wereplaced in the greenhouse for 1 d, transferred to a germination chamber,set to 27° C., for 2 d (Carolina Greenhouses; Kinston, N.C.), and thenreturned to the greenhouse. Shade curtains (33% transmittance) were usedto reduce solar intensity in the greenhouse and artificial lighting, a1:1 mixture of metal halide and high pressure sodium lamps (Sylvania)that delivered an irradiance of approximately 220 μmol m²s⁻¹ was used toextend day length to 16 h and to supplement solar radiation on overcastdays. Evaporative cooling and steam heat were used to regulategreenhouse temperature, maintaining a daytime set point of 27° C. and anighttime set point of 22° C. At approximately 7 days post sowing (dps),seedlings were thinned to one seedling per pot and at 17 to 21 dps, thepots were spaced farther apart to accommodate plant growth. Plants werewatered with Hoagland nutrient solution as required. Followinginoculation, waste irrigation water was collected and treated with 0.5%sodium hypochlorite for 10 minutes to neutralize any viral contaminationbefore discharging into the municipal sewer.

Inoculation of Plants

For each GENEWARE™ clone, 180 μL of inoculum was prepared by combiningequal volumes of encapsidated RNA transcript and FES buffer [0.1Mglycine, 0.06 M K₂HPO₄, 1% sodium pyrophosphate, 1% diatomaceous earth(Sigma), and either 1% silicon carbide (Aldrich), or 1% Bentonite(Sigma)]. The inoculum was applied to three greenhouse-grown Nicotianabenthamiana plants at 14 or 17 days post sowing (dps) by distributing itonto the upper surface of one pair of leaves of each plant (˜30 μL perleaf). Either the first pair of leaves or the second pair of leavesabove the cotyledons was inoculated on 14 or 17 dps plants,respectively. The inoculum was spread across the leaf surface using oneof two different procedures. The first procedure utilized a Cleanfoamswab (Texwipe Co, NJ) to spread the inoculum across the surface of theleaf while the leaf was supported with a plastic pot label (3/4×5 2M/RL,White Thermal Pot Label, United Label). The second implemented a 3″cotton tipped applicator (Calapro Swab, Fisher Scientific) to spread theinoculum and a gloved finger to support the leaf. Following inoculationthe plants were misted with deionized water and maintained in agreenhouse.

Infection Scoring

At 13 days post inoculation (dpi), the plants were examined visually anda numerical score was assigned to each plant to indicate the extent ofviral infection symptoms based on phenotypic characteristics. 0=noinfection, 1=possible infection, 2=infection symptoms limited to leaves<50-75% fully expanded, 3=typical infection, 4=atypically severeinfection, often accompanied by moderate to severe wilting and/ornecrosis.

Example 14 Metabolic Screens

A. Sample Generation. Individual dwarf tobacco Nicotiana benthamiana,(Nb) plants were manually transfected with an unique DNA sequence at 14or 17 days post sowing using the GENEWARE™ viral vector technology,Example 13. Plants were grown and maintained under greenhouseconditions.

Samples were grouped into sets of up to 96 samples per set forinoculation, harvesting and analysis. Each sample set included 8negative control (reference samples), up to 80 unknown (test) samples,and 8 quality control samples.

B. Harvesting. At 14 days after infection, infected leaf tissue,excluding stems and petioles, was harvested from plants with aninfection score of 3. Infected tissue was placed in a labeled,50-milliliter (mL), plastic centrifuge tube containing a tungstencarbide ball approximately 1 cm in diameter. The tube was immediatelycapped, and dipped in liquid nitrogen for approximately 20 seconds tofreeze the sample as quickly as possible to minimize degradation of thesample due to biological processes triggered by the harvesting process.Harvested samples were maintained at −80° C. between harvest andanalysis. Each sample was assigned a unique identifier, which was usedto correlate the plant tissue to the DNA sequence that the plant wastransfected with. Each sample set was assigned a unique identifier,referred to as the harvest or meta rack ID.

C. Extraction. Prior to analysis, the frozen sample was homogenized byplacing the centrifuge tube on a mechanical shaker. The action of thetungsten carbide ball during approximately 30 seconds of vigorousshaking reduced the frozen whole leaf tissue to a finely homogenizedfrozen powder. Approximately 1 gram of the frozen powder was extractedwith approximately 7.5 mL of a solution of isopropanol (IPA):water 70:30(v:v), to achieve a 0.133 g/ml ratio, containing pentadecanoic acidethyl ester (C 15:0 EE) and p-hydroxybenzioc acid as surrogate standardsby shaking at room temperature for 30 minutes.

D. Fractionation. A 200 μL aliquot of the IPA:water extract wastransferred to a clean glass container and referred to as Fraction 2(F2). A 1200 microliter (μL) aliquot of the IPA:water extract waspartitioned with 1200 μL of hexane. The hexane layer was removed to aclean glass container. This hexane extract is referred to as Fraction 1(F1). A 90 μL aliquot of the hexane extracted IPA:water extract wasremoved to a clean glass container. This aliquot is referred to asFraction 4 (F4). The remaining hexane extracted IPA:water extract isreferred to as Fraction 3 (F3). Each fraction for each sample wasassigned a unique fraction aliquot ID (sample name).

E. Sample Preparation & Data Generation

Fraction 1: The hexane extract was evaporated to dryness under nitrogenat room temperature. The sample containers were sealed and stored at 4°C. prior to analysis, if storage was required. Immediately prior tocapillary gas chromatographic analysis using flame ionization detection(GC/FID), the F1 residue was reconstituted with 180 μL of hexanecontaining pentacosane and hexatriacontane which were used as internalstandards for the F1 analyses. The chromatographic data files generatedfollowing GC separation and flame ionization detection were named withthe Fraction 1 aliquot ID for each sample and stored in a folder namedafter the harvest rack (sample set) ID. FIG. 7 a summarizes the GC/FIDparameters used to analyze Fraction 1 samples.

Fraction 2: The F2 aliquot was evaporated to dryness under nitrogen atroom temperature and reconstituted in heptane containing 2 internalstandards, undecanoate methyl ester (C11:0) and lignoceroate methylester (C24:0). In general, Fraction 2 is designed to analyze esterifiedfatty acids, such as phospholipids, triacylglycerides, and thioesters.In order to analyze these compounds by GC/FED, they were transmethylatedto their respective methyl esters by addition of sodium methoxide inmethanol and heat. Excess reagent was quenched by the addition of asmall amount of water, which results in phase separation. The fatty acidmethyl esters (FAMEs) were contained in the organic phase. FIG. 7 bsummarizes the GC/FID parameters used to analyze Fraction 2 samples.

Fraction 3: The remaining hexane extracted IPA:water extract (F3) wasevaporated under nitrogen at 55° C. The residue was reconstituted with400 μL of pyridine containing hydroxylamine hydrochloride and theinternal standards, n-octyl-β-D-glucopyranoside and tetraclorobenzene(OXIME solution). The derivatization was completed by the addition of400 μL of the commercially available reagent(N,O-bis[trimethylsily]trifluoroacetamide)+1% trimethylchlorosilane(BSTFA +1% TMCS). The chromatographic data files generated following GCseparation and flame ionization detection were named with the Fraction 3aliquot ID for each sample and stored in a folder named after theharvest rack (sample set) ID. FIG. 7 c summarizes the GC/FID parametersused to analyze Fraction 3 samples.

Fraction 4: The F4 aliquot was diluted with 90 μL of distilled water and20 μL of an 0.1 N hydrochloric acid solution containing norvaline andsarcosine, which are amino acids that are used as internal standards forthe amino acids analysis. Immediately prior to high performance liquidchromatographic analysis using fluorescence detection (HPLC/FLD), theamino acids in F4 are mixed in the HPLC injector at room temperaturewith buffered orthophtaldehyde solution, which derivatizes primary aminoacids, followed by fluorenyl methyl chloroformate, which derivatizessecondary amino acids. Following HPLC separation and fluorescencedetection, chromatographic data files were generated for each sample,named with a sequential number which can be tracked back to the F4aliquot ID, and stored in a folder named after the harvest rack (sampleset) ID. FIG. 7 d summarizes the HPLC/FLD parameters used to analyzeFraction 4 samples.

F. Carbohydrate analysis by digestion of the Dionex extracted plantresidue. Approximately 1 gram of frozen homogenized plant tissue wasweighed into stainless steel extraction cartridges sandwiched betweenfiberglass filters. The samples were extracted with approximately 12 mLof 50:50 (v:v) isopropanol:water containing 0.1 N potassium hydroxide at120° C. and 2000 psi. The extracted sample residue was dried for 2 hoursat 100° C. The dry residue, 10 to 20 mg, was transferred from theextraction cartridge into a 13×100 mm glass tube containing 0.5 mL of0.5 N hydrochloric acid in methanol and 0.12 mL of methyl acetate,blanketed with nitrogen, then sealed with TEFLON coated screw cap andheated for 16 hours at 80° C. The liquid phase was then transferred(using an 8 channel pipetor) to a glass insert supported by a 96 wellaluminum block plate containing 10 uL of t-butanol, which was thenevaporated to dryness under hot flowing nitrogen. The methyl-glycosidesand methyl-glycosides methyl ester residues generated by themethanolysis were silylated in 0.1 mL pyridine and 0.1 mL of thecommercially available reagent(N,O-bis[Trimethylsily]trifluoroacetamide)+1% Trimethylchlorosilane(BSTFA+1% TMCS) at room temperature for one hour. The derivatized sampleis analyzed by GC/FID using a DB1 capillary column (15 meters, 250microns I.D., 0.25 microns film thickness) with an 11 minute temperatureprogram (160° C. to 190° C. at 5° C./min, then 190° C. to 298° C. at 36°C. /minute followed by a 2 minute hold). The dual injection GC analyzes192 samples/day from two 96 well plates. Table 1 lists the carbohydratesthat have been identified and quantitated in the sample extracts.

G. Data Analysis & Hit Detection. Two complementary methods were used toidentify modifications in the metabolic profile of samples. These dataanalysis methods are called automated data analysis (ADA) andquantitative data analysis (QUANT. ADA was used to identify hits inFractions 1, 2, 3, and 4; whereas, QUANT was used to identify hits inthe Carbohydrate Fraction and Fractions 2 and 4. If either methodidentified a fraction as a hit, the nucleic acid sequence used totransfect the plant tissue from which that fraction was obtained isclassified as a hit.

ADA employs a qualitative pattern recognition approach using ABNORM (TheDow Chemical Company). ABNORM is described in the U.S. Pat. No.5,592,402 (incorporated herein by reference). ADA was performed onchromatograms from Fractions 1 through 4. The ADA process developed astatistical model from a subset of all the available chromatogramswithin an analysis group. The chromatograms in this subset are definedto be “normal” and a chromatogram cannot be part of the normal subset ifit fails any one of the following three tests. The first test involvesalignment. A chromatogram may fail to aligned if, A) correlation is toosmall in any one of the correlation windows, B) extrapolation is toogreat as defined by the user, or C) maximum correlation is found toonear the edge of the correlation window. The second test involves theinternal standard. If a chromatogram has any one of the user selectedinternal standards with an area count more than 5 median absolutedeviations (MADs) from the median area count of the chromatograms in theanalysis group it is marked as “bad” and can not be included in thenormal subset. The third test involves the total area count. If achromatogram has a total area count more than 5 MADs below the mediantotal area count of the chromatograms in the analysis

Chromatograms that pass the above three tests are defined as “good”samples. The normal subset was further determined as a certain percent(user defined) of the “good” samples that are nearest in terms of the2-norm. The model developed on the “normal” subset was then used to testall of the chromatograrns in the analysis group for statisticallysignificant differences from the model.

Updated models for each fraction were generated for each sample set.Detection limits were generated for each of the fractions.

Quantitative data analysis was based on individual peak areas.Quantitative data analysis was applied to specific compounds of interestin Fraction 2 (fatty acids), Fraction 4 (amino acids) and carbohydrateanalysis. Peaks for these compounds were identified based on retentiontime. The peak areas corresponding to these compounds were generated.For Fraction 2 and the Carbohydrate Fraction, the relative percent ofthe peak areas for the compounds in Table I were calculated. The average( x) and standard deviation (STD) of the relative % of the peak areasfor the individual compounds were calculated from the referencechromatograms generated with the sample set. The average and STD wereused to calculate a range for each compound. Depending on the compound,this range was typically x+/−3 or 5 STDs. If the relative percent of thepeak area from a sample was outside this range, the compound wasconsidered to be significantly different from the ‘normal’ level and thesample was identified as a hit for F2 or the Carbohydrate Fraction. ForFraction 4, the concentration, in micrograms/gram was calculated foreach of the amino acids listed in Table 1 from calibration standardsanalyzed within the same sample set. The amino acid concentrations fromreferences were used to calculate the ‘normal’ range from the x and STDfor each amino acid. If the amino acid concentration for a sample felloutside this range, the amino acid was considered to be different from‘normal’ and the sample was identified as a hit for F4.

TABLE 1 Tobacco Metabolites Monitored in Fractions 2, 4 & Carbohydratesby Quantitative Analysis Fraction 4 Fraction 2 (Fatty Acids) (AminoAcids) Carbohydrates undecanoic acid methyl ester* C11:0 Aspartic AcidASP Arabinose Pentadecanoic acid methyl C15:0 Glutamic Acid GLU Rhamnoseester** Pentadecanoic acid ethyl ester** C15:0 Serine SER Xylosepalmitic acid methyl ester C16:0 Histidine HIS Mannose palmitoleic acidmethyl ester C16:1 Glycine GLY Galactose iso methylpentadecanoic acidC16:0:Me Threonine THR Galacturonic Acid methyl ester palmitoleic acidmethyl ester C16:2 Alanine ALA Glucose palmitolenic acid methyl esterC16:3 Arginine ARG iso methylhexadecanoic acid C17:0Me Tyrosine TYRmethyl ester Stearic acid methyl ester C18:0 Cystine CY2 Oleic acidmethyl ester C18:1 Valine VAL Linoleic acid methyl ester C18:2Methionine MET Linolenic acid methyl ester C18:3 Norvaline* NVAArachidic acid methyl ester C20:0 Tryptohane TRP Lignoceric acid methylester* C24:0 Phenylalanine PHE Isoleucine ILE Leucine LEU Lysine LYSSarcosine* SAR Proline PRO

H. Shipping Hits. Shipping Hits. Any F1, F2, F3, or CarbohydrateFractions identified as hits by ADA and/or quantitative analysis, andthe most typical null for each fraction for each sample set asidentified by ADA, were sent to the Function Discovery Laboratory ofAnalytical Sciences Capability within Corporate R&D (The Dow ChemicalCompany; Midland, Mich.) for structural characterization andquantification of relative change of the specific biochemical compoundsaltered (see Example 15). Samples were sealed, packaged on dry ice andshipped for overnight delivery.

Example 15 Identification of Metabolic Changes

This Example describes the identification of the chemical nature ofgenetic modifications made in tobacco plants using GENEWARE™ viralvector technology. The protocols involved the use of gaschromatography/mass spectrometry (GC/MS) for the analyses of threeprimary fractions obtained from extraction and fractionation processes.

A. Methods. Major instruments and accessories used includedbioinformatics computer programs (see the description of the Maxwellprogram in WO 02/10486, hereby incorporated by reference); mass spectrallibraries [includes, Biotech FDL, which is also described in WO02/10486, and two commercial libraries: NIST Standard ReferenceDatabase-NIST98-(National Institute of Standards and Technology) and theWiley Registry of Mass Spectral Data-WILEY275-(John Wiley and Sons,Inc.)]; biotechnology database (FDL is described in WO 02/10486)-the FDLBiotechnology Database is based on the MICROSOFT ACCESS database programfrom MICROSOFT (Redmond, Wash.) and utilizes ACCORD FOR ACCESS(available from Accelrys Inc.; San Diego, Calif.) to incorporatechemical structures; BLIMS, a customized LIMS (Nautilus 99; ThermoLabSystems Ltd., Manchester, England) for sample tracking andinformation transfer; biotechnology database (eBRAD; Dow/DAS/LSBC) anORACLE (Redwood Shore, Calif.) based rational database that is adepository containing data from various screens and associatedsequencing data; HP Model 6890 capillary gas chromatograph (GC; AgilentTechnologies); HP Model 5973 Mass Selective Detector (MSD; AgilentTechnologies); auto-sampler and sample preparation station (LEAPTechnologies); large volume injector system (APEX); Ultra Freezer(Revco); and ChemStation GC/MS Software (Agilent Technologies).

Subject samples (those exhibiting an altered metabolic characteristic)and corresponding References (also referred to as controls or nulls)were shipped via overnight mail from the Metabolic Screening Laboratory(Indianapolis, Ind.) to the Function Discovery Laboratory (Midland,Mich.). Samples were removed from the shipping container, inspected fordamage, and then placed in a freezer until analysis by GC/MS.

Samples were received in vials or in titer plates with a titer plate(TP) number, also referred to as a Rack Identification number that wasused to track the sample in the BLIMS system. The titer plate number wasused by the FDL to extract from BLIMS pertinent information from ADA(Automated Data Analysis chromatographic pattern recognition software)HIT reports and/or QUANT (a quantitative data analysis approach thatmakes use of individual peak areas of select peaks corresponding tospecific compounds of interest in the fatty acid Fraction 2) HIT reportsgenerated by the Metabolic Screening Laboratory. The information inthese reports included the well position of the respective HITs(Subject), the corresponding well position of the Reference, and otherpertinent information, such as, aliquot identification. This informationwas used to generate ChemStation and LEAP sequences for FDL analyses.

Samples were sequenced for analysis in the following order:

TABLE 2 Analysis Order Solvent Blank Instrument Performance StandardSubjects and Associated Reference . . . Performance Standard SolventBlank

Samples were analyzed on GC/MS systems using the following procedures.Fraction 1 samples were shipped dry and required a hexane reconstitutionstep. Fraction 2 and Fraction 3 samples were analyzed as received.Internal standards and surrogate standards were added to the samplesprior to GC/FD analysis (see Example 14).

B. Fraction 1 Analysis. The name of the GC/MS method used is BIONEUTx(where x is a revision number of the core GC/MS method). The method isretention-time locked to the retention time of pentacosane, an internalstandard, using the ChemStation RT Locking algorithm.

-   -   Internal/Surrogate* Standard(s)    -   Pentacosane    -   Hexatriacontane    -   *Pentadeconoic acid, ethyl ester

Chromatography Column: J&W DB-5MS 50M × 0.320 mm × 0.25 μm film Mode:constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi: vacuum Oven:40° C. for 2.0 min 20° C./min to 350° C., hold 15.0 min Equilibrationtime: 1 min Inlet: Mode: split Inj Temp: 250° C. Split ratio: 50:1 GasType: Helium

LEAP Injector: Injector: Inj volume: optimized to pentacosane peakintensity (typically 20 μL) Sample pumps: 2 Wash solvent A: Hexane Washsolvent B: Acetone Preinj Solvent A washes: 2 Preinj Solvent B washes: 2Postinj Solvent A washes: 2 Postinj Solvent B washes: 2

APEX Injector Method Name: BIONEUTx (where x is a revision number of thecore APEX method). Modes: Initial: Standby (GC Split) Splitless: (PurgeOff) 0.5 min GC Split: (Standby) 4 min ProSep Split: (Flow Select) 23min Temps:  50° C. for 0.0 min. 300° C./min to 350° C., hold for 31.5min

Mass Spectrometer Scan: 35-800 Da at sampling rate 2 (1.96 scans/sec)Solvent delay: 4.0 min Detector: EM absolute: False EM offset: 0 Temps:Transfer line: 280° C. Ion source: 150° C. MS Source: 230° C.

C. Fraction 2 Analysis: The name of the GC/MS method used is BIOFAMEx(where x is a revision number of the core GC/MS method). The method isretention-time locked to RT of undecanoic acid, methyl ester, aninternal standard, using the ChemStation RT Locking algorithm.

-   -   Internal/Surrogate* Standard(s)    -   Undecanoic acid, methyl ester    -   Tetracosanoic acid, methyl ester    -   *Pentadecanoic acid, methyl ester    -   *Pentadecanoic acid, ethyl ester

Chromatography Column: J & W DB-23 FAME 60M × 0.250 mm × 0.15 μm filmMode: constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi: vacuumOven: 50° C. for 2.0 min 20° C./min to 240° C., hold 10.0 minEquilibration time: 1 min Inlet: Mode: split Inj Temp: 240° C. Splitratio: 50:1 Gas Type: Helium

LEAP Injector: Injector: Inj volume: optimized to undecanoic acid,methyl ester peak intensity (Typically 10 μL) Sample pumps: 2 Washsolvent A: Methanol Wash solvent B: Methanol Preinj Solvent A washes: 2Preinj Solvent B washes: 2 Postinj Solvent A washes: 2 Postinj Solvent Bwashes: 2

APEX Injector Method Name: BIOFAMEx (where x is a revision number of thecore APEX method). Modes: Initial: GC Split Splitless: 0.5 min GC Split:4 min ProSep Split: 21 min Temps: 60° C. for 0.5 min. 300° C./min to250° C., hold for 20 min 300° C./min to 260° C., hold for 5 min

Mass Spectrometer Scan: 35-800 Da at sampling rate 2 (1.96 scans/sec)Solvent delay: 4.5 min Detector: EM absolute: False EM offset: 0 Temps:Transfer line: 200° C. Ion source: 150° C. MS Source: 230° C.

D. Fraction 3 Analysis. The name of the GC/MS method used is BIOAQUAx(where x is a revision number of the core GC/MS method). Method isretention-time locked to the RT of n-octyl-β-D-glucopyranoside, aninternal standard, using the ChemStation RT Locking algorithm.

Internal/Surrogate* Standard(s)

-   -   n-Octyl-β-D-glucopyranoside    -   *Tetrachlorobenzene    -   *p-Hydroxybenzoic acid

Chromatography Column: Chrompack 7454 CP-SIL 8 60M × 0.320 mm × 0.25 μmfilm Mode: constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi:vacuum Oven: 40° C. for 2.0 min 20° C./min to 350° C., hold 10.0 minEquilibration time: 1 min Inlet: Mode: split Inj Temp: 250° C. Splitratio: 50:1 Gas Type: Helium

LEAP Injector: Injector: Inj volume: Optimized ton-octyl-β-D-glucopyranoside peak intensity (Typically 2.5 μL) Samplepumps: 2 Wash solvent A: Hexane Wash solvent B: Acetone Preinj Solvent Awashes: 2 Preinj Solvent B washes: 2 Postinj Solvent A washes: 2 PostinjSolvent B washes: 2

APEX Injector Method Name: BIOAQUAx (where x is a revision number of thecore APEX method). Modes: Initial: GC Split Splitless: 0.5 min GC Split:4 min ProSep Split: 20 min Temps:  60° C. for 0.5 min. 300° C./min to350° C., hold for 21.1 min

Mass Spectrometer Scan: 35-800 Da at sampling rate 2 (1.96 scans/sec)Solvent delay: 4.0 min Detector: EM absolute: False EM offset: 0 Temps:Transfer line: 280° C. Ion source: 150° C. MS Source: 230° C.

E. Performance Standard: Two mixtures were used as instrumentperformance standards. One standard was run with Fraction 1 and 3samples and the second was run with Fraction 2 samples. Below is thecomposition of the standards as well as approximate retention timevalues observed when run under the GC/MS conditions previouslydescribed. These retention time values are subject to change dependingupon specific instrument and chromatographic conditions.

TABLE 3 Fraction 1 and 3 Performance Standard Time Compound 6.25dimethyl malonate 7.25 dimethyl succinate 8.15 dimethyl glutarate 8.98dimethyl adipate 11.06 dimethyl azelate 11.42 hexadecane 11.70 dimethylsebacate 13.57 eicosane 15.36 tetracosane 16.88 octacosane 18.26dotriacontane 19.95 hexatriacontane

TABLE 4 Fraction 2 Performance Standard Time Compound 8.82 undecanoicacid, methyl ester 9.32 dodecanoic acid, methyl ester 10.24tetradecanoic acid, methyl ester 11.07 hexadecanoic acid, methyl ester11.84 octadecanoic acid, methyl ester 11.90 oleic acid, methyl ester12.14 linoleic acid, methyl ester 12.39 linolenic acid, methyl ester12.60 eicosanoic acid, methyl ester 13.42 docosanoic acid, methyl ester

F. Data Analysis. Subject and Reference data sets were processed usingthe Bioinformatics computer program Maxwell. The principal elements ofthe program are 1) Data Reduction, 2) Two-Dimensional Peak Matching, 3)Quantitative Peak Differentiation (Determination of RelativeQuantitative Change), 4) Peak Identification, 5) Data Sorting, and 6)Customized Reporting.

The program queries the user for the filenames of the Reference data setand Subject data set(s) to compare against the Reference. A completelisting of user inputs with example input is shown below.

TABLE 5 Bioinformatics Analysis USER QUERY EXAMPLE USER INPUT OperatorName M. Maxwell Total number of data files to process 5 Which Fraction 3Reference (Control) File Name AAPR0020.D Process a specific RT Range YSpecific RT range 6.5-23  Internal Standard Retention Time 14.902 +/−variation in Internal Std. RT .004 Variation in peak RI, ChemStation.005 Percent variation in peak RI, Biotech .010 Database Threshold fordetermining Area % change 60 Spectral Matching Value (Threshold MS- .95XCR for peaks to be a match) Percent to determine LOP-PM* Value 1Percent to determine LOP-SRT** Value 3 Quality Level for Library(Library match) 80 Subtract Background Y Time Range for Background21.5-22.6 SHORT SUMMARY (y/n, y = no Y chromatograms) *LOP-PM - Limit ofProcessing for Peak Matching **LOP-SRT - Limit of Processing for Sorting

The program integrates the Total Ion Chromatogram (TIC) of the data setsusing Agilent Technologies HP ChemStation RTE integrator parametersdetermined by the analyst. The corresponding raw peak areas are thennormalized to the respective Internal Standard peak area. It should benoted that before the normalization is performed, the programchromatographically and spectrally identifies the Internal Standardpeak. Should the identification of the Internal Standard not meetestablished criteria for a given Fraction, then the data set will not befurther processed and it will be flagged for analyst intervention.

Peak tables from the Reference and each Subject were generated. The peaktables are comprised of retention time (RT), retention index (RI)—theretention time relative to the Internal Standard RT, raw peak areas,peak areas normalized to the Internal Standard, and other pertinentinformation.

The first of two filtering criteria, established by the analyst was theninvoked and must be met before a peak is further processed. Thecriterion is based upon a peak's normalized area. All normalized peakshaving values below the Limit of Processing for Peak Matching (LOP-PM),were considered to be “background”. These “peaks” were not carried forthfor any type of mathematical calculation or spectral comparison.

In the initial peak-matching step, the Subject peak table was comparedto the Reference peak table and peaks between the two were paired basedupon their respective RI values matching one another (within a givenvariable window). The next step in the peak matching routine utilizedmass spectral data. Subject and Reference peaks that have beenchromatographically matched were then compared spectrally. The spectralmatching was performed using a mass spectral cross-correlation algorithmwithin the Agilent Technologies HP ChemStation software. Thecross-correlation algorithm generates an equivalence value based uponspectral “fit” that was used to determine whether thechromatographically matched peaks are spectrally similar or not. Thisequivalence value is referred to as the MS-XCR value and must meet orexceed a predetermined value for a pair of peaks to be “MATCHED”, whichmeans they appear to be the same compound in both the Reference and theSubject. The MS-XCR value can also be used to judge peak purity. Thistwo-dimensional peak matching process was repeated until all potentialpeak matches were processed. At the end of the process, peaks arecategorized into two categories, MATCHED and UNMATCHED.

A second filtering criterion was next invoked, again based upon thenormalized area of the MATCHED or UNMATCHED peak. For a peak to bereported and further processed, its normalized area must meet or exceedthe predetermined Limit of Processing for Sorting (LOP-SRT).

Peaks that are UNMATCHED are immediately flagged as different. UNMATCHEDpeaks are of two types. There are those that are reported in theReference but appear to be absent in the Subject (based upon criteriafor quantitation and reporting). These peaks were designated in theAnalyst Report with a percent change of “−100 percent” and thedescription “UNMATCHED IN SAMPLE”. The second types of peaks are thosethat were not reported in the Reference (again, based upon criteria forquantitation and reporting) but were reported in the Subject, thusappearing to be “new” peaks. These peaks were designated in the AnalystReport with a percent change of “100 percent” and the description “NEWPEAK UNMATCHED IN NULL”.

MATCHED peaks were processed further for relative quantitativedifferentiation. This quantitative differentiation is expressed as apercent change of the Subject peak area relative to the area of theReference peak. A predetermined threshold for change must be observedfor the change to be determined biochemical and statisticallysignificant. The change threshold is based upon previously observedbiological and analytical variability factors. Only changes above thethreshold for change were reported.

Peaks were then processed through the peak identification process asfollows. The mass spectra of the peaks were first searched against massspectral plant metabolite libraries. The equivalence value assigned tothe library match was used as an indication of a proper identification.

To provide additional confirmation to the identity of a peak, or tosuggest other possibilities, library hits were searched further againsta Biotechnology database (FDL; Dow). The Biotechnology database is basedon the MICROSOFT ACCESS database program from MICROSOFT and utilizesACCORD FOR ACCESS (available from Accelrys Inc.) to incorporate chemicalstructures into the database.

The Compound Identification Number (CIN) of the compound from thelibrary was searched against those contained in the database. If a matchwas found, the CIN in the database was then correlated to the dataacquisition method for that record. If the method was matched, theprogram then compared the retention index (RI), in the Peak Table, ofthe component against the value contained in the database for that givenmethod. Should the RI's match (within a given window of variability)then the peak identity was given a high degree of certainty. Componentsin the Subject that are not identified by this process were assigned aunique Compound Identification Number based upon Fraction Number and RI(example: F1-U0.555). The unique identifier was used to track unknowncomponents. The program then sorts the data and generates an AnalystReport.

An Analyst Report is an interim report consisting of PBM algorithm matchquality value (equivalence value), RT, Normalized Peak Area, RI(Sample), RI (database) Peak Identification status [peak identity ofhigh certainty (peaks were identified by the program based on thepre-established criteria) or criteria not met (program did notpositively identify the component)], Component Name, CIN (a uniqueidentifier, which could be a CAS number), Mass Spectral Library(containing spectrum most closely matched to that of the component),Unknown ID (unique identifier used to track unidentified components),MS-XCR value, Relative % Change, Notes (MATCHED/UNMATCHED), and othermiscellaneous information. The Analyst Report was reviewed manually bythe analyst who determined what further analysis was necessary. Theanalyst also generated a modified report, for further processing by theprogram, by editing the Analyst Report accordingly.

For Fractions 2 and 3, derivatization procedures were performed prior toanalysis to make the certain components more amenable to gaschromatography. Thus, the compound names in the modified analyst report(MAR) were those of the derivatives. To accurately reflect the truecomponents of these fractions, the MAR was further processed usinginformation contained in an additional database. This databasecross-references the observed derivatized compound to that of theoriginal, underivatized “parent” compound by way of their respectivecompound identification numbers and replaces derivatives with parentnames and information for the final report.

The Modified Analyst Report also contains a HIT Score of 0, 1, or 2. Thevalue is assigned by the analyst to the data set of the Subject aliquotbased on the following criteria:

0 No FDL data on Subject 1 FDL data collected; Subject not FDL HIT 2 FDLdata collected; Subject is FDL HITAn FDL HIT is defined as a reportable percent change (modification)observed in a Subject relative to Reference in a component ofbiochemical significance.

An electronic copy of the final report is entered into the Nautilus LIMSsystem (BLIMS) and subsequently into eBRAD (Biotech database). Theprogram also generated a hardcopy of the pinpointed TIC and therespective mass spectrum of each component that was reported to havechanged.

“NQ” and “NEW” are two terms used in the final report. Both terms referto UNMATCHED peaks whose percent changes cannot be reported in anumerically quantitative fashion. These terms are defined as follows:

-   -   “NQ” is used in the case where there was a peak reported in the        Reference for which there was no match in the Subject (either        because there was no peak in the Subject or, if there was, the        area of the peak did not satisfy the Limit of Processing for        Peak Matching). The percent change designation of “−100%” used        in the Analyst report is replaced with “NQ”.    -   “NEW” is used in those situations where a peak was reported in        the Subject but for which there was no corresponding match in        the Reference (either because there was no peak in the Reference        or, if there was, the area of the peak did not satisfy the Limit        of Processing for Peak Matching). For these situations, the        percent change designation of “100%” used in the Analyst Report        is replaced with “NEW”. The designation of “NEW” in the final        report to a component that is present in the Sample but not in        the Reference was necessary to eliminate any ambiguity with the        appearance of “100%” for MATCHED peaks. A “100%” designation in        the final report exclusively refers to a component with        modification that doubled in the Subject relative to the        Reference.

G. Results. The results of the identification of metabolic changes areshown in FIG. 6.

Example 16 Bioinformatic Analysis of Hits

A. Phred and Phrap: Phred is a UNIX based program that can read DNAsequencer traces and make nucleotide base calls independent of anysoftware provided by the DNA sequencer manufacturer. Phred also providesa quality score for each base that can be used by the investigator totrim those sequences or preferably by Phrap to help its assemblyprocess.

Phrap is another UNIX based program which takes the output of Phred andtries to assemble the individual sequencing runs into larger contiguoussegments on the assumption that they all belong to a single DNAmolecule. While this is clearly not the case with collections ofExpressed Sequence Tags (ESTs) or with heterogeneous collections ofsequencing runs belonging to more than one contiguous segment, theprogram does a very good job of uniquely assembling these collectionswith the proper manipulation of its parameters (mainly-penalty and-minscore; settings of 15 and 40 respectively provide contiguoussequences with exact homology approaching 95% over lengths ofapproximately 50 nucleotide base pairs or more). As with all assembliesit is possible for proper assemblies to be missed and for improperassemblies to be constructed, but the use of the above parameters andjudicious-use of input sequences will keep these to a minimum.

Detailed descriptions of the Phred and Phrap software and it's use canbe found in the following references which are hereby incorporatedherein by reference: Ewing et al., Genome Res. 8:175 [1998]; Ewing &Green, Genome Res. 8:186 [1998]; Ewing et al., Genome Res. 8:195 [1998].

BLAST

The BLAST set of programs may be used to compare a set of sequencesagainst databases composed of large numbers of nucleotide or proteinsequences and obtain homologies to sequences with known function orproperties. Detailed description of the BLAST software and its uses canbe found in the following references which are hereby incorporatedherein by reference: Altschul et al., J. Mol. Biol. 215:403 [1990];Altschul et al., J. Mol. Biol. 219:555 [1991].

Generally, BLAST performs sequence similarity searching and is dividedinto 5 basic subroutines of which 3 were used: (1) BLASTN compares anucleotide sequence to a nucleic acid sequence database; (2) BLASTXcompares translated protein sequences from a nucleotide sequence done insix frames to a protein sequence database; (3) TBLASTX comparestranslated protein sequences from a nucleotide sequence done in sixframes to the six frame translation of a nucleotide database. BLASTX andTBLASTX are used to identify homologies at the protein level of thenucleotide sequence.

B. Contig Sequence Assembly for Hits. Phred sequence calls and qualitydata for the individual sequencing runs associated with the above SEQIDs are stored in a relational database. All the sequence runs stored inthe database for the SEQ IDs to be assembled were extracted from thedatabase and the files needed by Phrap recreated with the aid of a Perlscript. Perl is an interpreted computer language useful for datamanipulation. The same script ran Phrap on the assembled files and thenstored the assembled contiguous sequences and singletons in a relationaldatabase. The script then assembled two files. One file was a FASTAformat file of the sequences of the assembled contigs and singletons.The other file was a record of the assembled sequences and whichsequencing runs they contained. FASTA format is a standard DNA sequenceformat recognized by the BLAST suite of programs as well as by Phrap.Both of these files were then inspected manually to detect incorrectassemblies or to add sequence information not present in the relationaldatabase. Any incorrect assemblies found were corrected before this filewas used in BLAST searches to identify function and well as otherhomologous sequences in our databases. Correct assemblies that containedmore than one SEQ ID were separated. Although these represent parts ofthe same sequence, since these are ESTs and contain limited genesequence data, a one-to-one nucleotide match cannot be predicted at thistime for the entire length of a contig representing a single SEQ ID withthose containing multiple SEQ IDs. Some full length sequences wereobtained and are designated with an FL.

C. Identification of Function. The FASTA formatted file obtained asdescribed above was used to run a BLASTX query against the GenBanknon-redundant protein database using a Perl script. The data from thisanalysis was parsed out by the Perl script such that the followinginformation was extracted:the query sequence name, the level of homologyto the hit and the description of the hit sequence (the highest scoringhit from the analysis). The script filtered all hits less than 1.00E-04,to eliminate spurious homologies. The data from this file was used toidentify putative functions and properties for the query sequences

D. Identification of Similar Sequences in Derwent™. The FASTA formattedfile obtained as described above was used to run a BLASTN query againstthe Derwent™ non-redundant nucleotide database as well as a BLASTXagainst the Derwent™ non-redundant protein database using Perl scripts.These Derwent™ non-redundant databases were created by extracting allthe sequence information in the Derwent™ database. The data from thisanalysis was parsed out by the Perl script such that the followinginformation was extracted, the query sequence name, the level ofhomology to the hit and the description of the hit sequence (the highestscoring hit from the analysis). The script filtered all hits less than1.00E-04, to eliminate spurious homologies.

E. Identification of Homologous Sequences. An internal relationaldatabase contains sequences from a large number of SEQ IDs belonging toa diverse group of organisms. In order to identify sequences in thedatabase with high levels of homology to the sequences functionallyidentified as hits and contained in the FASTA formatted file describedabove, the following analysis was performed.

All the sequences were extracted in FASTA format from our relationaldatabase with standard SQL commands and converted into a searchableBLAST database using tools provided in the BLAST download from theNational Center for Biotechnology Information (NCBI). A Perl script thenran a BLASTN search of our query file against our internal nucleotidedatabase containing all relevant sequences. The script then extractedfrom all hits the following information: the query name, the level ofhomology and the hit seqID. The script then filtered all homologies lessthan 1.00E-04 as well as all the redundant seqIDs.

This analysis was repeated again using a TBLASTX query. Both files werethen combined and the redundancies eliminated. Since the query sequencesare also present in the database, these redundancies were manuallyeliminated from the results file. Lastly, all hit SEQ IDs with homologyscores less than 1.00E-20 were filtered from the results list.

These results were used to extract the sequence and quality score datafrom the relational database in order to repeat the analysis describedin “Contig Sequence Assembly for Hits”. The final product consisted oftwo files. One file (FIG. 1) contains a record of the assembledsequences and the sequencing runs they include. The other file (FIG. 2)lists the search hits with homologies better than 1.00E-20 to the querycontigs and singletons.

All publications and patents mentioned in the above specification areherein incorporated by reference. Various modifications and variationsof the described compositions and methods of the invention will beapparent to those skilled in the art without departing from the scopeand spirit of the invention. Although the invention has been describedin connection with particular preferred embodiments, it should beunderstood that the inventions claimed should not be unduly limited tosuch specific embodiments. Indeed, various modifications of thedescribed modes for carrying out the invention which are obvious tothose skilled in the art and in fields related thereto are intended tobe within the scope of the following claims.

1. The isolated nucleic acid SEQ ID NO:3459.
 2. A vector comprising anucleic acid according to claim
 1. 3. The vector of claim 2 wherein saidnucleic acid is operably linked to a plant promoter.
 4. The vector ofclaim 2 wherein said nucleic acid is in sense orientation.
 5. The vectorof claim 2 wherein said nucleic acid is in antisense orientation.
 6. Aplant transfected with a nucleic acid according to claim 1 wherein saidtransfection is selected from the group consisting of calciumphosphate-DNA co-precipitation, DEAE-dextran-mediated transfection,polybrene-mediated transfection, electroporation, microinjection,liposome fusion, lipofection, protoplast fusion, retroviral infection,and biolistics.
 7. The plant of claim 6 wherein said transfection is bymeans of a vector.
 8. A plant organ from the plant of claim 6, saidorgan selected from the group consisting era seed, leaf, root and stem.9. A process for making a transgenic plant comprising: a. providing anucleic acid according to claim 1, and a plant, and b. transfecting saidplant with said nucleic acid.
 10. The process of claim 9 wherein saidtransfecting is under conditions such that expression of said nucleicacid confers upon said plant an altered metabolic characteristicselected from the group consisting of an acid, alcohol, fatty acid,branched chain fatty acid, base, alkaloid, amino acid, ester, glyceride,phenolic compound, carbohydrate, sterol, oxygenated terpene, isoprenoidcompound, alkene, alkyne, hydrocarbon, ketone and quinone.
 11. Theprocess of claim 9 wherein said transfecting is by means of a vector,said vector comprising a nucleic acid according to claim
 1. 12. Theprocess of claim 9 wherein said altered metabolic characteristic confersdisease resistance upon said plant.